POM - Description of the research project (from the project proposal)
Brief description of the proposal
The present project aims at designing a system of syntactic parameters, shaped on the toolkit provided by the Parametric Comparison Method (PCM), in order to model crosslinguistic variation in a domain of the grammar which has never been explored at this broad level, namely that of Differential Object Marking (DOM).
To achieve this goal, statistical techniques will be instrumental.
Despite the pervasiveness of splits in the morpho-syntactic marking of objects, which are at the core of DOM, and despite deep typological and formal investigation, there is no current unified theoretical framework this phenomenon can be systematized under. This project aims at taking the first steps in filling this gap. In doing so, not only will it contribute to the understanding of the formal nature of
objects, but it will also enhance insights into the cognitive mechanisms underlying syntactic diversity across natural languages.
This unprecedented enterprise is now conceivable thanks to the formal, methodological framework made available by the PCM, which in this project will be applied to a radically novel set of data.
The project has two main goals:
1. Deriving from a constrained set of parameters the apparently unrestrained distribution of
crosslinguistic DOM manifestations
2. Providing a novel testing domain for modeling and measuring crosslinguistic syntactic diversity
through the PCM
In the model adopted by the PCM, each parameter is minimally defined by the following properties:
(i) it stems from a universal set of more abstract formats;
(ii) it is responsible of different (covarying) observable patterns;
(iii) it is part of a network of cross-parametric dependencies.
Adapting the domain of variation encompassed by DOM to such a rigorous system poses non-trivial challenges, due to the vast amount of interacting properties in the manifestations of this phenomenon. This will be addressed with the crucial aid of statistical investigation. Different bivariate and multivariate methods will be applied, such as cluster analysis, multivariate correspondence analysis and association between pairs of languages.
The initial dataset covers 60 languages from various families, mostly from Eurasia (Indo-European, Basque, Sino-Tibetan, Altaic), but also including a significant sample of Afro-Asiatic.
A comprehensive list of DOM patterns will be created, starting from the material collected in Irimias work, and considering complex interactions between DOM and various modules of the grammar (e.g., the internal structure of the nominal domain, the verbal and clausal domain, tense and aspect).
The output of the project is a system of parameters defining the observed variation concerning DOM in the dataset, which will eventually be applied at a wide-scale (ideally global) level.
The parameters will be set in the languages of the dataset, to test their consistency with the parameter setting algorithm assumed in the model and with their explanatory power in terms of crosslinguistic coverage.
The parameter grid so obtained will then be used for measuring language similarity.
State of the art
The PCM [1][2][3][4][5] is a research line that uses the analytical tools of formal grammar and quantitative sciences to explore human history. The method has shown that abstract cognitive linguistic structures encode history to a remarkable extent [3][6][7][8] and has developed explicit procedures to statistically assess hypotheses of cross-family aggregations [9][10].
These results have provided insights into broader, interdisciplinary areas of research, such as gene-language comparison [11][12][13], the analysis of microvariation [7][14][15][16], and, crucially for the purposes of the present project, the modeling of syntactic diversity through formally structured parameter systems [17][18][19][20].
In fact, the relevant methodological asset of the PCM for the present project is that comparative procedures are grounded on a rigorous system of parameters; these are conceived as a universally-derived set of variable abstract grammatical rules responsible for several co-varying empirical structures.
The use of a system of parameters, rather than of unrelated superficial patterns, to measure language similarity, has prompted an important breakthrough not only in terms of phylogenetic reconstruction [6][8][10], but also in the shaping of crosslinguistic structural diversity [19][20].
In the present project, the parameter model developed within the PCM [19][20] will be implemented to investigate the domain of direct objects with their universal and variable properties.
In most languages, direct objects exhibit important splits when it comes to their encoding in the syntax and the morphology. A clear tendency is for objects that contain hierarchically organized features (e.g., humanness, animacy, specificity, definiteness, [21]) to be signaled by dedicated markers or positions in the clause [22][23]. This universal property is at the core of the phenomena that go under the label DOM. Despite this unifying factor, the puzzle with DOM is that it gives rise to impressive variation, manifested at various levels. For example, on the morphological side, some languages (e.g. Romance) employ markers more similar to an adposition [24], others use case markers (e.g. Altaic), others signal the special direct objects via agreement on the predicate (e.g. Bantu), or a doubling reduced pronominal (clitic), or yet a combination of these, to cite just the most common strategies (see examples in Fig.1). In the syntax, in some languages a higher position is obligatory, while in others no such preference is detected. Variation is salient on the interpretive side too, first and foremost in the taxonomy of features special objects are sensitive to (e.g., animacy in Romance, as opposed to specificity in Altaic). In spite of a long and rich tradition of crosslinguistic work [21][22][23][25][26][27][28][29], there is consensus that crucial core aspects of DOM are still in need of detailed investigation and analysis.
Description of objectives, methods, preliminary data and activity plan
OBJECTIVES
The goal of the present project is to design a system of parameters, based on the model adopted by the PCM [19][20], such as to cover the crosslinguistic variation observed in the vast and complex
domain of DOM.
The novelty of this enterprise resides in reconciling the theoretical investigation of DOM with the testing of the PCM. In this respect, the project will provide two major advancements:
1. For the investigation of DOM: deriving from a constrained set of parameters the apparently unrestrained distribution of crosslinguistic DOM manifestations
2. For the PCM: providing a further testing domain for modeling and measuring crosslinguistic syntactic diversity and its historical significance
The rich diversity in DOM has been emphasized in many typological and formal investigations, but has never been systematized under an internally consistent theoretical framework.
The current project aims at taking the first steps in filling this gap.
Scrutiny of the empirical evidence, the analysis of the literature, and the implementation of the parameter system will be informed by the following questions:
Q1. Is it possible to detect covariation patterns in the superficial properties of DOM? Are they dependent on other grammatical properties?
Q2. Is it possible to reduce the formal DOM-related features to a universal, explicitly defined set?
Q3. Is it possible to reduce the observed variation in the manifestations of DOM to a set of parameters complying with the requirements of universality (derived from a universal set of simple formats), abstractedness (few abstract rules responsible for a complex network of variable superficial patterns), and learnability (associated with an explicit setting algorithm), as defined by the PCM
model?
METHODS
The following components of the PCM toolkit ([5], www.parametricomparison.unimore.it) will be used.
1) A SET OF PARAMETERS
In the tradition of the PCM, the choice of the parameter set is based on Longobardis Modularized Global Parametrization (MGP, [1]), encompassing two strategies:
A. Selecting a domain of the grammar with limited interference from outside
B. Starting from a language sample exhibiting salient minimal contrasts in the given domain
Concerning B, DOM provides an excellent research ground, for at least two important aspects: first, there is a plethora of analyses for numerous languages, closely and remotely related; second, DOM is salient at a broad cross-linguistic level, but also shows refined points of microvariation. This facilitates the testing of the parameter model against different levels of comparison.
As for A, DOM intrinsically presupposes intertwining of many modules in the grammar, such as the structure of nominals, the determiner and clitic systems, the verbal and clausal domains, etc. This poses a difficulty for the modularized approach: to avoid potentially unmanageable interference, parameters must be formulated in such a way as to capture all the possible interactions with various components of the grammar.
A further constraint the parameter set must adhere to deals with parameter format. The model adopted by the PCM [19] is based on Longobardis Principles&Schemata framework [30][31], where parameters are assumed to emerge from a set of universal formats which apply to different features and become actualized under specific empirical conditions, variable crosslinguistically. To comply with this model, DOM parameters must explicitly define the relevant formal features and check their variable realizations: this will be done by taking as a starting point the list in [32], which will be extended to reach the required crosslinguistic coverage.
2) A SET OF MANIFESTATIONS
Parameter systems display two levels of deductive structure. The first is that each parameter is usually responsible for a plurality of co-varying superficial patterns [33]: the very definition of a parameter crucially encompasses the list of superficial patterns (manifestations, [19]) observed in a language when that parameter is active.
Hence, a crucial step towards reducing DOM to a set of parameters is to identify stable covariation across the observed patterns.
This is a non-trivial enterprise given the intricate crosslinguistic diversity in this domain and the huge amount of variable patterns to be accounted for. To provide an example, in her attempt to list the properties which define the manifestations of DOM in two languages of the same sub-family (Spanish and Romanian), Irimia [32] has singled out 30 points of variation, which are likely to increase considerably when extending the crosslinguistic typological coverage. In this scenario, the task of identifying covariation patterns over such a huge amount of properties can only be accomplished through complementing theoretical investigations with data analysis, a combined approach that will reveal empirical generalizations in the data not immediately detectable through qualitative investigation only.
3) A PARAMETER SETTING ALGORITHM
Parameters are conceived as abstract rules encoded in the mental grammar of each speaker. The framework adopted by the PCM assumes that, during the process of language acquisition, speakers add a parameter to their grammar only if its manifestations are available in the primary data: otherwise, that parameter does not become part of the grammar [19]. Thus, {+parameter P} encodes an addition, while {−parameter P} represents the absence of the relevant rule.
To concretely set parameters, one must check, for each manifestation, whether it is actualized or not in a given language. A parameter is set to {+} if at least one of its manifestations is unambiguously observed in the language.
Parameters are sometimes associated with negative evidence: ungrammatical patterns can in fact be regarded as parameter manifestations [19]. However, in this model, negative evidence alone cannot be regarded as the only evidence used to set a parameter: for it to be acquirable, each parameter must be associated with at least one positive manifestation [19]. If no manifestation is observed, the default value is assigned (meaning that the parameter is not active).
Yet, negative evidence might pose a challenge for the parametrization of DOM: admittedly, crucial insights into the nature of this phenomenon come from ungrammaticality. For example, the literature contains wide debates about whether DOM in Romance is an accusative structurally, despite its oblique (dative/locative) form [34][35][36]. The empirical evidence needed, in most cases, presupposes what would qualify as negative evidence (e.g., impossibility of DOM co-occurrence with SE passives and impersonals), while explicit evidence indicating an accusative syntax (e.g., agreement patterns restricted to accusatives and not obliques, accusative clitic doubling) might not be always available.
Reconciling these two conflicting aspects is an important goal in the analysis and classification of DOM through the parameter model the project intends to engage with.
4) A SET OF IMPLICATIONAL FORMULAS
The second level of the deductive structure of parameter systems is that parameters generate a pervasive network of partial interdependencies [2][3][4][37][38][39][40]: one value of parameter A, but not the other, entails the irrelevance of parameter B, whose consequences, i.e. the corresponding superficial patterns, become predictable. In the PCM system, the conditions which must be met in order for a parameter not to be neutralized are expressed in a Boolean form (either as simple states of another parameter, or as conjunctions or disjunctions of values of other parameters, see Fig.2); the symbol {0} indicates that the parameter is neutralized due to the violation of such conditions [4]. Such a pervasive implicational structure produces a noticeable downsizing [9] of the space of possible grammatical variation [41], which in turn has non-trivial consequences for the modeling of language acquisition [19][20] and the measuring of language relatedness [10].
In this framework, uncovering and explicitly defining crossparametric dependencies becomes an indispensable step in the formulation of a reliable parameter system.
Again, when investigating broad domains of variation such as DOM, parameter interdependencies might not be easy to uncover. In this respect, Kazakov at al [42] have shown that automatic techniques can be an important tool to reveal previously unknown dependencies between parameters. Following this line, the project will implement further automatic analyses to the DOM parameter system, thus exploiting a model applicable to other domains.
PRELIMINARY DATA
As already said, DOM is a well-studied phenomenon: the many different approaches in the literature range from cross-family explorations (to uncover its universal aspects) to language groups whose diversification is recent and yet well-structured (see [43] for a recent attempt to reduce the variation encountered in the dialects of Italy to parameter hierarchies). Hence, a vast amount of empirical and theoretical work is available. In the present project, the initial data come from a set of 60 languages (Map1) covering various families across Eurasia (Indo-European, Basque, Sino-Tibetan, Altaic), and a sample of Afro-Asiatic.
As for the grammatical properties associated with DOM, Irimias list, based on standard Spanish and Romanian [32], will be exploited and enhanced to reach the required crosslinguistic coverage.
The data already available do illustrate complex implicational relations between many properties.
Sensitivity to various grammatical traits is, in fact, a core aspect of DOM, which must be taken into account when defining crossparametric implications (point 4, sect. METHODS). To briefly exemplify, there are clear interactions between DOM and various parameters of the PCM dataset (Fig.2) defining the internal structure of the nominal domain [44]. For example, in languages that have
grammaticalized strategies for marking the so-called definite reading (parameter DGR in Fig.2, responsible in turn for a set of further parameters defining the structural components of such marking [4][14]), DOM interacts with these structures. The Romance domain, where parameter DGR is active, displays a vast array of interactions between DOM and the available definiteness-marking strategies (e.g., the literature about standard Spanish [25], Romanian, Sardinian, [32][45], etc).
Compatibility of bare nouns with DOM gives rise to further variation (as shown, e.g., in [44] for some dialects of Sicily or in [46] about Spanish varieties). Finally, DOM is sensitive to the internal structure of proper names ([47], parameter NWD in Fig.2): for example, in Romance, where NWD is active, DOM variously interacts with expletive articles (parameters NEX, PEX, FEX) [48][49][50].
Obviously, DOM is also observed in languages where parameters such as DGR are not active (e.g. Farsi) or neutralized (e.g., Mandarin, Cantonese): in these cases, DOM interferes with other grammatical devices (e.g. classifiers: parameters FGC, FGE in Fig.2; or linkers: parameters LKO, LKA, LKP) [51][52].
Another aspect concerns the relations with the broader clause structure, for example the type of transitive construction [53][54][55][56][57], information structure [58], the presence of a clitic that doubles the object [59], etc. To investigate these interactions, the list of clausal parameters discussed in [40] and [60] will also be considered. Finally, this project takes into account less explored domains, such as DOM in impersonal and passive structures [24][61][62][63][64].
ACTIVITY PLAN
The following research activities are envisaged (a schema of the timeline is provided in Fig.3).
EMPIRICAL INVESTIGATION (WP1, months 1-14) will be managed by IRIMIA. The goal of this WP is to collect the empirical evidence about DOM from the literature and the grammars for the languages in Map1. This dataset permits a good crosslinguistic coverage, as required by the MGP (point 1, sect. METHODS). For these languages, a lot of empirical and theoretical work is available;
thanks to in-depth work by Irimia, a vast database of examples is ready for the present research. Empirical analysis will be conducted capitalizing on existing data; dedicated fieldwork (via expert linguists and native speaker consultants) will be planned only when the needed material is missing. Concerning family-internal variation, a special focus will be put on Romance (with a particular
interest on the dialects of Italy) and Semitic. These groups provide an ideal testing ground not only because the research team has extensive experience on them, but also because they manifest DOM at various degrees (from no DOM to extensive DOM, through a series of intermediate stages) and generally realize it through different strategies, thus allowing for crosslinguistic comparison. An
important focus are less addressed data whose investigation will enhance knowledge of the nature of DOM.
To perform empirical work, a Research Assistant (RA) will be hired by month 4 (for a total of 18 months). Their expertise should encompass proficiency in crosslinguistic and statistical investigation.
STATISTICAL ANALYSIS (WP2, mo 9-16) will be led by MORLINI, assisted by the RA. In this WP, the list of empirical patterns related to DOM, their interactions and relations with different modules of the grammar will be explored through multivariate techniques such as multiple correspondence analysis and cluster analysis.
PARAMETER DATASET (WP3, mo 13-18). This WP will be led by GUARDIANO and IRIMIA. The goal is to explore the generalizations emerging from WP2 and formalize them in a list of parameters (drawn from Longobardis schemata, point 1, sect. METHODS), their manifestations (point 2, sect. METHODS) and crossparametric dependencies (point 4, sect. METHODS). These parameters will tentatively be set in the languages of the database to check their crosslinguistic coverage and their reliability in terms of the parameter setting algorithm (point 3, sect. METHODS).
TESTING OF PARAMETER IMPLICATIONS (WP4, mo 17-21). This WP, led by MORLINI, will perform statistical testing to uncover further potential crossparametric dependencies, to be included in the dataset after testing their theoretical reliability (with the assistance of IRIMIA and GUARDIANO).
CROSSLINGUISTIC COMPARISON (WP5, mo 21-24). This WP, managed by GUARDIANO with MORLINI and IRIMIAs assistance, will measure crosslinguistic similarity in the dataset, using the parameter grid produced by WP4. The procedures adopt distance-based hierarchical cluster analysis using similarity indexes and association indexes between pairs of languages, to generate a set of taxonomic representations based on parametric distances calculated over the DOM parameter grid.
Concerning publications, dissemination and public outreach, a dedicated WP is planned (WP6, PROJECT MANAGEMENT, DISSEMINATION, OUTREACH) that will be managed by the PI and will cover the whole duration of the project (months 1-24).
References
[1] Longobardi 2003 https://doi.org/10.1075/livy.3.06lon
[2] Guardiano and Longobardi 2005 https://dx.doi.org/10.1093/acprof:oso/9780199272129.003.0010
[3] Longobardi and Guardiano 2009 https://dx.doi.org/10.10 16/j.lingua.2008.09.012
[4] Guardiano and Longobardi 2017 https://dx.doi.org/10.1093/oxfordhb/9780199573776.013.16
[5] Guardiano et al 2021 https://dx.doi.org/10.1002/9781118732168.ch7
[6] Longobardi et al 2013 https://dx.doi.org/10.1075/jhl.3.1.07lon
[7] Guardiano et al 2016 South by Southeast. A Syntactic Approach to Greek and Romance Microvariation. ID 77: 95-166.
[8] Ceolin et al 2020 https://dx.doi.org/10.3389/fpsyg.2020.488871
[9] Bortolussi et al 2011 10.3233/978-1-60750-762-8-168
[10] Ceolin et al 2021 https://dx.doi.org/10.1098/rstb.2020.0197
[11] Colonna et al 2010 https://doi.org/10.1159/000317374
[12] Longobardi et al 2015 https://dx.doi.org/10.1002/ajpa.22758
[13] Santos et al 2020 https://dx.doi.org/10.3390/genes11121491
[14] Guardiano and Fanciullo (eds) 2020 ID 81.
[15] Guardiano et al 2020 Contact and resistance. In Del Puente et al (eds) Tra etimologia romanza e dialettologia. Edizioni dellOrso, 177-188.
[16] Guardiano and Stavrou 2021 https://doi.org/10.3390/languages6020074
[17] Crisma and Longobardi 2024 The parametric space associated with D. In Armoskaite and Wiltschko (eds) The Oxford Handbook of Determiners. OUP.
[18] Crisma et al 2024 A unified theory of Case form and Case meaning: Genitives and parametric syntax. In Sevdali et al (eds) The Place of Case in Grammar. OUP.
[19] Crisma et al 2020 https://dx.doi.org/10.4454/ssl.v58i2.265
[20] Crisma et al 2024 What are your values? Defauls and asymmetry in parameter states. JHS, in press.
[21] Silverstein 1976 Hierarchy of features and ergativity. In Dixon (ed) Grammatical categories in Australian languages, 112-171. Canberra: A. Inst. of Aboriginal Studies.
[22] Comrie 1989 Language Universals and Linguistic Theory. Chicago.
[23] Aissen 2003 https://doi.org/10.1023/A:1024109008573
[24] Irimia 2023 https://doi.org/10.16995/glossa.5751
[25] López 2012 https://doi.org/10.7551/mitpress/9165.001.0001
[26] Bossong 1985 Empirische Universalienforschung. Différentielle Objektmarkierung in den neuiranischen Sprachen. Gunter Narr.
[27] Bossong 1991 https://doi.org/10.1075/cilt.69.14bos
[28] Bossong 1998 https://doi.org/10.1515/9783110804485.193
[29] Lazard 2001 https://doi/10.1515/9783110194036/html
[30] Longobardi 2005 https://doi.org/10.1515/zfsw.2005.24.1.5
[31] Longobardi 2017 https://dx.doi.org/10.1017/9781107279070.013
[32] Irimia 2020 https://dx.doi.org/10.1515/opli-2020-0110
[33] Chomsky 1981 Lectures on Government and Binding. Foris.
[34] Manzini and Franco 2016 https://doi.org/10.1007/s11049-015-9303-y
[35] Irimia and Pineda 2019 https://doi.org/10.1075/li.42.1
[36] Bárány 2018 https://doi.org/10.5334/gjgl.639
[37] Fodor 2001 https://doi.org/10.1017/S1358246100010808
[38] Baker 2001 The atoms of language. OUP.
[39] Biberauer and Roberts 2013 https://doi.org/10.1515/9781614512431
[40] Roberts 2019 https://doi.org/10.1093/oso/9780198804635.001.000
[41] Moro 2016 Impossible languages. MIT Press.
[42] Kazakov 2017 https://dx.doi.org/10.26615/978-954-452-040-3_005
[43] Ledgeway 2023 https://doi.org/10.1075/la.280.11led
[44] Guardiano 2023 https://doi.org/10.1075/la.280.08gua
[45] Irimia 2020 https://dx.doi.org/10.1515/9783110666137-003
[46] Leonetti 2009 https://doi.org/10.1075/slcs.112.07leo
[47] Longobardi 1994 Reference and proper names: A theory of N-movement in syntax and logical form. LI: 609-665.
[48] De Angelis 2019 Articolo espletivo e marcatura differenziale delloggetto nel dialetto reggino di San Luca. ID 80: 5976.
[49] Chilà 2017 Il sincretismo genitivo-dativo nella varietà reggina di S. Luca. ID 78: 5771.
[50] Ledgeway 2019 https://doi.org/10.17863/CAM.36667
[51] Karimi 1996 Case and specificity: Persian ra revisited. Ling Analys, 26(3/4):173194.
[52] Zuo 1993 https://doi.org/10.1515/ling.1993.31.4.715
[53] Pensado (ed) 1995 El complemento directo preposicional. Visor.
[54] Cennamo 2003 https://doi.org/10.1515/9783110919837.49
[55] Cennamo 1998 https://doi.org/10.1515/9783110912470.73
[56] Cennamo and Ciconte 2024 Differential object marking in early Italo-Romance and old Sardinian, in press.
[57] Torrego 1998 The dependencies of objects. MIT Press.
[58] Iemmolo 2010 https://doi.org/10.1075/sl.34.2.01iem
[59] Jaeggli 1986 https://doi.org/10.1163/9789004373150_003
[60] Baker et al 2020 https://www.mmll.cam.ac.uk/files/slides_baker_et_al_november_2020.pdf
[61] Irimia 2018 https://dx.doi.org/10.3765/plsa.v3i1.4345
[62] Irimia 2024 https://doi.org/10.1162/ling_a_00522
[63] Irimia and Pineda 2024 Impersonal SE constructions in Catalan. Ms.
[64] Irimia et al 2024 DOM in impersonal SE constructions: the view from southern Italy. CIDSM