TY - JOUR
T1 - Vissen naar variatie
T2 - Digitaal op zoek naar onbekende Noord/Zuid-verschillen in de grammatica van het Nederlands
AU - Grondelaers, Stefan
AU - Troij, Robbert De
AU - Speelman, Dirk
AU - Bosch, Antal van den
PY - 2020/4
Y1 - 2020/4
N2 - Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light – deictic strength variation and subordination variation – did indeed manifest national sensitivity.
AB - Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in Bannard & Callison-Burch 2005) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light – deictic strength variation and subordination variation – did indeed manifest national sensitivity.
UR - https://pure.knaw.nl/portal/en/publications/0cbdb50b-891d-402d-8543-b94b8a31bc0b
U2 - 10.5117/NEDTAA2020.1.004.GRON
DO - 10.5117/NEDTAA2020.1.004.GRON
M3 - Article
SN - 1384-5845
VL - 25
SP - 73
EP - 99
JO - Nederlandse taalkunde
JF - Nederlandse taalkunde
IS - 1
ER -