Summarizing Long Regulatory Documents with a Multi-Step Pipeline

Mika Sie, Ruby Beek, Michiel Bots, Sjaak Brinkkemper, Albert Gatt

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Due to their length and complexity, long regulatory texts are challenging to summarize. To address this, a multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. In this paper, we show that the effectiveness of a twostep architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, the two-step architecture improves the performance of decoder-only models. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for longcontext encoder-decoder models, the extractive step worsens their performance. This research also highlights the challenges of evaluating generated texts, as evidenced by the differing results from human and automated evaluations. Most notably, human evaluations favoured language models pretrained on legal text, while automated metrics rank general-purpose language models higher. The results underscore the importance of selecting the appropriate summarization strategy based on model architecture and context length.

Original languageEnglish
Title of host publicationNLLP 2024 - Natural Legal Language Processing Workshop 2024, Proceedings of the Workshop
EditorsNikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro, Gerasimos Spanakis
PublisherAssociation for Computational Linguistics (ACL)
Pages18-32
Number of pages15
ISBN (Electronic)9798891761834
Publication statusPublished - 2024
Event6th Natural Legal Language Processing Workshop 2024, NLLP 2024, co-located with the 2024 Conference on Empirical Methods in Natural Language Processing - Miami, United States
Duration: 16 Nov 2024 → …

Publication series

NameNLLP 2024 - Natural Legal Language Processing Workshop 2024, Proceedings of the Workshop

Conference

Conference6th Natural Legal Language Processing Workshop 2024, NLLP 2024, co-located with the 2024 Conference on Empirical Methods in Natural Language Processing
Country/TerritoryUnited States
CityMiami
Period16/11/24 → …

Bibliographical note

Publisher Copyright:
©2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Summarizing Long Regulatory Documents with a Multi-Step Pipeline'. Together they form a unique fingerprint.

Cite this