Models of language variation and change: new evidence from language contact – Parametric Comparison Method

The PRIN project “Models of language variation and change: new evidence from language contact” (PRIN2017K3NHHY) was led by Rita Manzini (Università di Firenze). It began in December 2019 and ended in June 2023.

Here follows a short description of our research (from the “Brief description of the research proposal” written by R. Manzini).

Linguistic variation and change, no less than language universals, are not merely accidental to language but woven into its basic design. We are interested in contact phenomena not in themselves but in so far as they act as catalyzers or accelerators for variation and change, providing a unique window on them.

The aim of this project is threefold:

–  collecting and describing syntactic data in domains of language contact, primarily German(ic) and Balkan varieties in contact with Romance in the Italian peninsula and West/North Germanic/Romance contact in the change from Old to Middle English; 

– carrying out a formal analysis of the observed data, with special attention to phenomena involved at the interface between syntax and externalization (case/agreement, word order); 

– contributing to formal models of variation and change, with a focus on microvariation, understood as the basic unit of variation and change, rather than on typological macrovariation. 

In general, our goal is describing, analyzing and explaining language diversity, using the “external” impulses acting on it (i.e. interaction dynamics between speakers and/or speaking communities) as a tool to get to the core of the “internal” structures that determine it (i.e. the speakers’ grammatical competence).

The research units are based at the universities of Firenze (Rita Manzini), Verona (Alessandra Tomaselli), Trieste (Paola Crisma), Reggio Emilia (Cristina Guardiano).

Reggio Emilia

The research unit based in Reggio Emilia is implementing the PCM on the analysis of microvariation and parametric contact.

We explore the internal structure of the nominal domain in two major dialectal domains: Romance and Greek.

In particular, we focus on:

 - contact between the two groups in Southern Italy 

– the dialectal structure and classification of the Romance dialects of Italy  

– dialectal variation across Greek dialects in other areas of the Greek-speaking world

The researchers currently involved in the project are Cristina Guardiano, Vincenzo Stalfieri, Andrea Ceolin, Monica Irimia and Michela Cambria.

Description of the project

Abstract

Linguistic variation and change are interesting to the extent that, no less than language universals, they appear to be not accidental to language, but rather written into its basic design.

Specifically, conceptual and inferential components are presumably not subject to variation/change – nor are the computational components (syntactic operations, phonological operations). Rather, variation arises when these components interact (notably at externalization), posing questions of functional optimization to the system.

“Why are there so many languages? The reason might be that the problem of externalization can be solved in many different and independent ways”. “Diversity of language results from the fact that the principles do not determine the answers to all questions about language, but leave some questions as open parameters” (Berwick & Chomsky 2015).

Within the framework just defined, our general research objective in meshing contact data with parametric models is to detect, describe, measure and explain language diversity, using the external impulses acting on it as a tool to get to the core of the internal structures that determine it (i.e. the speakers’ grammatical competence).

Our aim is threefold:

collecting and describing syntactic data in several domains of language microvariation and contact in which the research team has documented expertise;
carrying out a formal syntactic analysis of the data, with emphasis on contact as a magnifying lens on (micro)variation and change;
developing and comparing models of syntactic parameters in the light of the formally analyzed data.

We are interested in contact phenomena not in themselves but in so far as they are one of the external factors that may act as catalyzers or accelerators for variation and change.

By contact, we mean any situation in which two different languages are spoken by the same community, hence in condition of bilingualism.

Specifically, we are interested in situations of protracted (centuries-long) contact, often historically documented, which allow us to observe internal diversification of the languages involved both on the temporal axis (change) and on the spatial axis (dialectal variation).

The empirical foundation of the project rests on several domains of evidence, which are robustly represented in the publications of the research team. Each of these domains is noted by the rich internal microvariation. They will form the object of further data gathering by fieldwork or by quantitative corpus analysis.

We mention a representative sample:

– German(ic) varieties in contact with Romance: Cimbrian, Mòcheno, Saurano, Sappadino, Timavese; see also the participation of members of the Verona unit to AthEME – Advancing the European Multilingual Experience (PI, L. Cheng, Leiden, EU 7th FP);

– Greek and Albanian varieties in contact with Romance, specifically Italo-Albanian (Arbëresh) and Italiot Greek (Southern Calabria: Bovesia; Salento: Grecìa Salentina)

– West Germanic/North Germanic/Romance (possibly Celtic) contact in the passage from Old to Middle English

The formal analysis aims at investigating domains of syntax and of the syntax/externalization interface on which the team has extensive publication record, specifically case/agreement and word order phenomena, both in the noun phrase and in the sentence. This is a paramount requirement for the third and crucial stage of our research centering on parametric models – to be evaluated not with respect to the raw data, but with respect to their formal analyses. Once again, we will take our bearings from previous work of the units.

We will collaborate with the ERC AdG project LanGeLin – Meeting Darwin’s last challenge (PI, G. Longobardi, York) of which Guardiano is Permanent Project Advisor. The implementation on the microvariation level of their Parametric Comparison Method (PCM), namely a structured system of a few dozen parameters for the nominal domain, has shown that several refinements are required(Guardiano et al. 2016). The approach to microvariation connected to the Lexical Parametrization Hypothesis (Manzini& Wexler 1987, Chomsky 1995, Manzini & Savoia 2011), ties variation to the externalization space defined by the lexicon in order to yield the required detail. Our first goal is a comparison and unification of these approaches.

The ultimate goal we work towards is a more ambitious global reassessment of current parametric models.

In proposing to carry out this research, we are highly aware that the empirical domains we investigate are fragile or outright endangered. Though language policy is not our focus, the project includes a social impact component, integrating a language education package – in recognition of the crucial role it plays in strengthening the use of a minority language in contact with a standard language.

State of the art

In the research framework we adopt, language is a biologically defined object, including universal and innate components, both conceptual and computational.

The lexicon is a natural locus of variation and change, providing a double interface, between the conceptual system and syntactico-semantic computation on the one hand, and between syntax and externalization on the other hand.

The other major locus of variation is the interface itself, connecting syntactico-semantic computation to phonological computation, and involving presumably functional optimization processes.

Against this background of assumptions (Chomsky & Berwick 2015, Chomsky et al. 2018), several research questions prominently arise.

The most basic question is: which are the limitations that syntactico-semantic (and phonological) computation imposes on variation? In other words:

What kinds of linguistic properties and relations are subject to variation and what is the admissible extent of this variation, given the constraints imposed by the Faculty of Language?
This question is investigated in the literature with the classical tools of formal linguistics. Thus certain domains of data are shown to be amenable to universal principles supplemented by a limited array of optional choices or parameters.
Additional theoretical hypotheses do not interfere with this basic research paradigm, be they cartography (of special relevance here Aboh 2015), the (Kaynian) Linear Correspondence Axiom (e.g. Sheehan et al. 2017), silent categories(Kayne 2010). Many works of the present unit fall into this tradition of studies in its various internal articulations.
Is variation structured in turn?
namely (a) by the hierarchical ordering of the values of a given parameter or (b) by the interaction between different parameters and/or of their settings as for instance that a particular parameter or parameter value may influence another.
This question is of central importance to two ERC AdG of the last decade, the already quoted LanGeLin (PI G. Longobardi), and ReCoS, Rethinking Comparative Syntax (PI I. Roberts, Cambridge).
The approach adopted by ReCoS is to provide hierarchical schemas able to encode the best known extant parameters: the null subject parameter, the head-movement parameter, the (case) alignment parameter, the A’-movement parameter (Biberauer et al. 2010, 2014). The schemas are identically organized only at very general level; in the detail, each parametric template imposes its own organization. In other words, like all attempts at total ordering, parametric hierarchies contain an element of rigidity and lack of modularity – nor is it clear how the templates interact with one another.
The LanGeLin project has at its core the Parametric Comparison Method (PCM, Longobardi & Guardiano 2009), which uses a parameter set concerning the internal structure of the noun phrase to predict (under appropriate statistical measures) linguistic philogenetic trees. According to Longobardi (2017), parameters are based on parameters schemas such as “Is F, F a feature, grammaticalized?”, “Is F, F a grammaticalized feature, spread on X, X a category?”, “Does a functional category … X have a phonological matrix Φ?” – which may in principle interact with one another.
Manzini & Savoia (2011), Manzini (2015) take an even weaker approach, namely that “the parameters interacting with [Externalization] are … the categorical splits” for instance “speaker vs. hearer, 1st/2nd person vs. D” in the realm of nominal properties. Thus even parameters schemas are epiphenomena, the conceptual workspace and the categorial cuts (parameters) that are or are not externalized by the lexicon is all there is. Interactions between categorical cuts take the form“categorial split A is not defined for value 0/1 of … categorial split B”.

It seems evident to us that the various approaches quoted share a fundamental attempt to fully free parameters from their connections with earlier models of generative grammar to bring them to bear on current minimalist theorizing. It is one of the ambitions of this project to contribute to this effort by opening a forum where the different approaches can be measured against data they have not explicitly been devised to handle.

Finally, since we pursue our studies on the basis of evidence from contact, we briefly acknowledge the literature on bilingual competences and related phenomena of code mixing, code switching, both within functionalist and mentalist frameworks (e.g. Romaine 1995, MacSwan 1999, Muysken 2000). For us, contact is instrumental to the investigation into models of parametrization – but we are aware of both the descriptive generalizations and the analytical insights coming from the dedicated literature.

Data collection

Variation in minimally different language systems (microvariation) has been the preserve of sociolinguistics and dialectology and is traditionally treated as a cultural or social artefact strictly associated with language-specific contingencies, hardly depending on universal factors.

The level of depth and the sophistication of the linguistic materials required for formal syntactic analysis go beyond the observation of lexical items, phonetic typologies, or even structural surface patterns. Thus, data can hardly be collected from already existing databases (if any); novel collection is required.

Data will be elicited from native speakers in fieldwork sessions, with as little help as possible from questionnaires and other mediations.

It should be stressed that this is a cognitivist, not a behaviorist, linguistic project: hence, speakers will not be recorded or observed in their everyday life, but only explicitly and consciously interviewed for the purposes of eliciting syntactic information.

We plan to publish the fieldwork data, duly transcribed, glossed and translated, in an electronic format (on the website of the project), to serve as a documentation of otherwise not easily accessible languages.

Fieldwork will be carried out in the following linguistic domains:

German(ic) language islands in the alpine regions of North-East Italy (Dolomiti area and Friuli): Mòcheno/Fernsentaler, Cimbrian, Sappadino/ Plodarisch, Saurano/ Sauris-Zahre, Timavese/ Tischlbongerisch, South-Carinthian of the Canale Valley. This work builds on several previous projects: CimbroLang – The lexicon-morphosyntax interface in language obsolescence with particular focus on Cimbrian semi-speakers in Trentino and Veneto regions (2008-2011, PI E. Bidese, Provincia di Trento); Il cimbro come laboratorio di analisi per la variazione linguistica in sincronia e diacronia (2009-2012, PI A. Tomaselli, Fondazione CariVerona); Sauris/Zahre – Feldforschung con raccolta di dati linguistici ai fini di una descrizione attuale del Saurano (2017-2018, PIs E. Bidese, A. Tomaselli, H. Weiß, Regione Trentino-Alto Adige); VinKo, Varietà in contatto/Varieties in contact/Varietäten in Kontakt, Università di Trento, https://www.dipsco.unitn.it/vinko. South-Slavic varieties in contact with Romance (Friulian) in the Resia Valley and (of special interest to us) Slovene varieties in contact with Romance (Friulian) and German (South-Carinthian) in the Canale Valley will also be investigated.
Arbëresh varieties of the Italian South – with special attention to outlying communities in Molise (Portocannone), Campania (Greci), Apulia (S. Marzano), Lucania (Barile, Ginestra); additional fieldwork will be carried out in the core Calabrian Arbëresh speaking area including the districts of Catanzaro and Cosenza, as well as in Sicily (Piana degli Albanesi). Comparison with mainland Albanian will be carried out with speakers of both the standard Tosk variety (Gjrokastër) and of Geg varieties (Shkodër). We take advantage of fieldwork in Albania to collect data from Aromanian varieties, in contact with Albanian and with Greek (Fier, Diviakë, Libofshë, Këllez, all in South Albania).
Greek varieties of the Italian South: Bovesia (Reggio Calabria) and Grecìa Salentina (Calimera and the area of Lecce). These will be compared at least with standard Modern Greek, one variety from the Ionian islands (geographically -and presumably also historically- closer to Southern Italy), Tsakonian (the most conservative dialectal enclaves in Greece, not affected by koinezation), two representatives of the Northern dialects (Thessaloniki, Lesvos), Cypriot Greek. Of direct interest for comparison is also Asia Minor Greek including Romeyka Pontic (Çaykara region, Turkey), as well as Cappadocian, Pharasiot, whose speakers are accessible though diaspora communities in mainland Greece.

Statistical corpus search will form the basis of work on Old and Middle English. Testing a given syntactic hypothesis for a mediaeval language requires a formidable work of data collection, because the revealing evidence may come from complex structures, relatively infrequent in the actual production. For the history of English, powerful tools for syntactic analysis are available, i.e. the electronic corpora elaborated at the University of Pennsylvania and the University of York. These corpora contain both word-by-word morphological annotation and constituent structure bracketing, which enables one to perform automated searches sensitive not only to co-occurrence and linear precedence, but also to structural dependencies. The size of the corpora makes it possible a statistical analysis of the results. The two corpora relevant for the present research are the YCOE (Taylor et al. 2003) and the PPCME2 (Kroch & Taylor 2000), covering the period 800-1500. When necessary for the purposes of investigating syntactico-semantic properties not considered in the original tagging, further annotation will be added to the original corpora.

Formal Analysis

The noun phrase (DP-phase): Balkan varieties

We investigate structures of adnominal modification/predication, including linkers and the expression of possession.

In linker structures, a modifier of the noun, typically an adjective or a possessor/genitive, is preceded by a lexical element agreeing with the noun itself, which in both Greek and Albanian(despite other differences) coincides with the definite article (Guardiano & Stavrou 2014, Franco et al. 2015). Linker structures are a nice illustration of the virtues of our study in being able to cross and compare several domains of contact-induced variation. Arbëresh varieties preserve linkers structures despite pervasive contact with linker-less Italo-Romance (Franco et al. 2015). In Italiot Greek, the indirect outcome of contact with Italo-Romance has been the progressive loss of linker structures (Guardiano & Stavrou 2014). Aromanian in contact with Greek and Albanian, has linkers in both pre-genitive and pre-adjective contexts (Romanian has them only in pre-genitive contexts).

Given enough data and subtle enough analytical tools, we can investigate a range of questions that open up with respect to contact and parameters, expecting precise answers. For instance:

do external factors determine linguistic outcomes or do the latter in fact distribute freely, with external pressure simply accelerating the rate of change and not determining its direction?
Is it possible to demonstrate a dependency of certain parametric outcomes from others – or are the (re)alignments observed entirely independent of any internal hierarchy among parameters?

Possessive subjects of DPs are connected to the expression of sentential agents/subjects as obliques in voice (and ergativity alternations) in the vP-phase. Relevant case studies include the middle-passive voice of Albanian and Greek (Manzini et al. 2016) and its externalization by a clitic in both Romance (‘si’) and in Albanian. Arbëresh causatives combining a finite embedded verb with the obliquization of the causee, as in Romance, are another case in point (Manzini & Savoia 2007).

The DP-phase: history of English

Our research on variation directly connects with that on change on the issue of the structure of the DP.

The position of English, as defined by its present-day nominal syntax, is internal to Germanic, of course. However, in previous experiments with the PCM method, its precise position within Germanic has not fallen clearly and steadily either within West Germanic or within North Germanic. This may be interpreted as evidence for massive syntactic contact between English and Scandinavian, or be considered evidence in favour of the hypothesis that Middle English is a direct descendant of the Scandinavian variety spoken in the Danelaw rather than of Old English. The contact with Norman French has also at times been held responsible for some changes that took place in the transition from Old to Middle English (Allen 2008), and it contributes to blurring the picture.

Perhaps the crucial aspect of the PCM is that syntactic changes can be used to demonstrate genealogical relations, but in this particular domain – i.e. the relation between English and Scandinavian – it has not proven successful yet; this may be due to the fact that the PCM only compared present-day languages and not their older stages. Therefore the formal analysis of DP syntax in Old and Middle English (Crisma 2011, 2012, 2015; Crisma & Pintzuk 2016) will be extended with the aim of assigning to the mediaeval languages as many parameter values as possible among those present in the PCM grid. This will form the basis for measuring the distance and, ideally, performing phylogenetic computational experiments, between Old and Middle English, Old Nordic and Norman French. The ambition is seeing whether contact has quantitatively detectable effects (on the diachronic axis).

The clausal spine (vP- and CP-phases): Germanic varieties

The crucial contribution of Germanic varieties to the present project concerns sentential word order – specifically given that the German/Romance contact implies two different head-complement orders (VO/OV) and the contrast between V2 and non-V2 syntax of root (or embedded root) sentences (Bidese & Tomaselli 2007, 2016, Bidese et al. 2012, 2014).

Rhaeto-Romance varieties in contact with German, especially of the Surselva, which are also within our domain of expertise (Bidese 2008, Manzini & Savoia 2005), provide the mirror-image case study. Strikingly, Cimbrian has VO order, aligned with Romance; however pronouns display a conservative Mittelfeld-like positioning. In the mirror-image case study, Romansh maintains SVO order. Nevertheless, Sursilvan probably moves the finite auxiliary a step higher, yielding a non-conservative positioning of weak pronouns/clitics. Thus, in the crossed conditions (Cimbrian in contact with Romance, Sursilvan in contact with German) somewhat similar word orders are produced – but implying different parametric choices.

One of the central aims of the present project is to disentangle external notions of convergence from the internal means that produce them. The question goes to the core of how the external pressures brought about by contact do and do not interact with the Language Faculty in(accelerating/magnifying) variation and change.

The case study we just set up is a realistic example, we think, of the kind of data and analytical tools that allow us to pursue the question in a precise enough way to admit of a predictive answer. Next, Germanic languages are all V2, at least residually. Most modern Romance languages have no generalized V2, and the null subject varieties not even residual V2 phenomena. Here again, in German varieties in contact, conservative and innovative phenomena are found: V2 and subject inversion on the one hand, and on the other hand cliticization phenomena (enclisis versus proclisis onto the finite verbal form, enclisis onto the subordinating conjunction). Romansh V2 varieties have often been assumed to preserve a medieval Romance character, but this is not in keeping with some of their features – notably subject auxiliary inversion of the Germanic type in Sursilvan (Manzini & Savoia 2005). Complementizer borrowing into German varieties is of special interest here because the borrowing interacts with word order and specifically with embedded V2. Interactions with null subject are another case study of interest, namely subject extraction out of subordinate clauses (the “that-trace effect”).

Parametric models

In short, the data collection and formal analysis efforts detailed above are expected to contribute to our knowledge of linguistic facts and our models of linguistic systems in several ways, connecting back to the general discussion of the state of the art.

We expect to collect new data relating to syntactic variation and change in varieties spoken in conditions of contact. The type of syntactic data we propose to collect and to make accessible is only very partially documented, often by the members of the research team themselves.

We expect to contribute to the analysis of syntactic phenomena involved in variations and change, possibly lying at the interface between syntactic computation and externalization systems. Phenomena involved in the interface are at least linear order and case/agreement processes.

We expect to be able to show that factors external to the grammar of a given language, such as contact with another language, can influence the shape of the grammar, determining quicker evolution, divergence/convergence phenomena with close cognates and with the contact language. This catalyzer role of contact means that situations of contact provide a particularly useful setting for the study of variation and change.

At the same time we expect that external factors may act on a language strictly within the limits defined by the Language Faculty and by its own grammar (lexicon). The choice of a contact setting is in fact meant to help us disentangling the two.

Finally, a reason why we focus on contact is that formal work in this specific domain is hard to come across. As part of the larger project of description, formal analysis and formal modelling that we expect to carry out, as just detailed, we specifically aim at bridging the gap between contact studies and formal grammars.

We are seeking to formulate and empirically verify hypotheses such as for instance the Resistance Principle of Guardiano et al. (2016): “Resetting of parameter α from value X to Y in language A as triggered by interference of language B only takes place if a subset of the strings that contribute to constituting a trigger or value Y of parameter α in language B already exists in language A”. In other words, the resetting of a parameter under the influence of interference data is possible only if the new triggers are similar enough to triggers already unmistakably present in the interfered language, though of course not sufficient on their own to trigger the new value.

In turn, a radical interpretation of Inertia (Keenan 2009) would require that the external pressure resulting in parameter resetting takes place not simply when possible, but rather when needed to fill an independently-created void (we may call this the Functional Void principle). This raises the further question whether a similar account eventually applies to all cases of syntactic borrowing, or whether the latter may take place even in cases where it is not required (in the sense delimited by the Resistance Principle).

In general, we seek to formalize hypotheses about the interaction of external and internal forces in linguistic variation in such way as to make them empirically verifiable within standard syntactic models.