PARTHICO - Research activities

Parameter theory on historical corpora: Measuring the power of parameter setting theory on historical corpora - PARTHICO

Research activities (from the project proposal)

The project is run by four research units (RU): Padova (UNIPD), Modena e Reggio Emilia (UNIMORE), Trieste (UNITS) and Verona (UNIVR). The RUs will be expanded hiring one researcher per RU: 18 months for UNIPD; 12 months each for UNIVR, UNIMORE, UNITS. All RUs adopt a parametric approach to the analysis of variation, and share expertise in diachronic and synchronic syntactic variation and in language acquisition/parameter setting in L1/L2/heritage languages. This shared background is the glue of the project: close and continuous collaboration between the RUs will be the key to its development, although each RU will be effectively responsible for the envisaged activities. In so doing, we take advantage of individual specialization and shared theoretical interests. In this respect, we consider the project as a launching pad for future more extensive collaborations.
The project development encompasses four work-packages (WP), of which WP1-3 for data production, analysis, and dissemination, WP4 for outreach activities. As follows, we detail the timeline with the roles played by each RU (see also the attached GANTT chart at the end of this section).


WP1: Infrastructure creation and refinement. Months 1-11

WP1 comprises the following activities: the finalization of the OI corpus and its public release (Targets 1 and 2, T1/2 in the GANTT chart: UNIPD), the refinement of the list of parameters and questions (T3: the four RUs), and their translation into queries (Milestone 1, M1 in the GANTT chart: UNIMORE, UNITS, UNIVR).
UNIPD will be responsible for the finalization of the OI corpus originally initiated within the BIRD 2020 project “The internal syntax of OI nominal expressions” funded by the Universitą di Padova. At the current state the corpus contains 24 texts from the XIII to the XV century for a total of more than 1 million words fully annotated for part of speech. Of them, 8 texts are also fully syntactically annotated adopting the UPenn treebank conventions and 7 texts are in the process of being syntactically annotated. The first 9 months of the project will be devoted to the syntactic annotation of the remaining 9 texts (T1). This step will lead to a treebank corpus of more than 1 million words. The annotation scheme adopted for the corpus follows as closely as possible the guidelines proposed for the French historical corpus, the MCVF, (https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation-french/), which in turn are an adaptation of the manual for the Penn Corpora of Historical English (https://www.ling.upenn.edu/hist-corpora/annotation/index.html). The guidelines for OI supplement the MCVF and the Penn Corpora of Historical English guidelines where necessary. The annotation task will benefit from established collaboration with Dr. Beatrice Santorini, the expert and the creator of the already existing Penn Treebank historical corpora. The researcher hired within UNIPD will be involved in the annotation and will also be in charge of testing the sanity of the corpus checking for mistakes, disfluencies, inconsistencies in the annotation. A paper illustrating the corpus, the annotation guidelines and the editorial choices is planned for publication (T2, by month 10). A website will be created by a professional designer to release the corpus free of charge when the sanity check is completed.
UNIMORE and UNITS will refine the list of parameters, manifestations, and questions. With the collaboration of UNIVR, the list of questions will be converted, when possible, into coding and search queries using CorpusSearch (http://corpussearch.sourceforge.net/CS-manual/QueryLanguage.html) (M1, month 8). These outputs will then be used to perform the research envisaged in WP2. A further output of WP1 will be a joint methodological paper by all four RUs presenting the tools which will be adopted to implement the parameter setting procedure on historical corpora (T3, by month 11).


WP2: Implementation of the procedure on OI. Months 9-24

WP2 is the core of the project. We will implement our parameter setting procedure, address the RQs and test our working hypotheses focusing on OI. This WP will develop the following activities, which will be performed in strict and effective collaboration between the four RUs.
(a) Looking for parameter values. For the questions converted into coding queries and validated in WP1, the OI corpus will be interrogated automatically using CorpusSearch. For those questions that cannot be converted into coding queries, the texts will be analyzed manually. UNIPD will oversee this task with the collaboration of UNIMORE; a list of parameter values describing the internal structure of OI DPs will be released (M2, month 13).
(b) Parametric investigation will also be performed through dedicated case studies, to explore the internal structure of OI DPs in comparison with historical and modern Germanic (taking advantage of the activities of WP3, which will run in parallel), and with modern Italo-Romance (using parametric data which are part of existing databases collected by UNIMORE). We will focus on modification strategies through relative clauses: these tasks will be overseen by UNIPD and UNIVR, whose investigators have expertise in this domain; a joint paper by the four RUs will be submitted by month 15 (T4).
(c) Perfecting and expanding the array of parameters. We will begin to answer our RQs by refining the parametric apparatus (parameters, manifestations, questions) and the formal representation of their interdependencies, in order to explore their role in constraining competing grammars. UNIMORE will lead and coordinate this activity, and this will be the major task of the hired researcher (M3, month 19).
(d) The data collected will be used to explore the nature and extent of conflicting evidence, and to investigate their role in competition between grammars. This task will be performed by the four RUs in strict reciprocal collaboration, and its output will be a joint paper presenting the theory of parameters emerged from the whole research, which will be submitted by the end of the project (T5).

WP3: Replication of the procedure on historical Germanic. Months 10-20

WP3 will run in parallel with WP2. We plan a replication of the parameter setting procedure on the historical Germanic languages presented above. The WP will develop the following activities:
(a) The procedure implemented on OI will be replicated on ENHG by UNIVR and on HIce and OE/ME by UNITS. The two researchers hired by the two RUs will be involved in these activities. The expected output is a list of parameters for each language (M4, month 16).
(b) In parallel, a complete characterization of the DP properties of the analyzed historical Germanic languages is envisaged, to be compared with their modern counterparts (M5, month 18). This phase is methodologically important as it will allow us to disentangle between competing hypothesis which may derive from investigating OI and to fully address RQ3, which addresses the problem of dealing with conflicting evidence.
(c) Finally, we will check whether the data corroborate the hypothesis of a coexistence, in OI, of a modern Italian and a Germanic-like grammar. This will be done through a detailed analysis of the DP-structure emerging from comparing historical and modern (Romance and Germanic) languages: a joint paper, to be submitted by month 20, will discuss these results (T6).
WP4: Public outreach. Months 21-24
WP4 is devoted to design and create activities addressed to the non-academic community. We believe that part of our results potentially has a non-trivial impact when properly made accessible to a wider audience. To this end, two initiatives will be organized under the lead of UNIVR during the final months of the project (T7): (i) a two-hour public meeting in which the non-academic community is invited to actively interact with the researchers of the project to understand how the parameter setting procedure works and why the results of the investigation on historical corpora can broaden the knowledge of the human language structure; (ii) a five-hour professional development course to upskill language teachers, who will design learning units (to be implemented in Secondary School classes) built upon the parametric model.

Project management, assessment, and evaluation

The members of all the RUs will work in continuous and close collaboration. To guarantee efficiency in the implementation of scientific and administrative activities, we envisage the following internal meetings: (1) a kick-off meeting (KO in the GANTT chart, month 1), to launch the project, assess the beginning of activities, and develop a plan for hiring the dedicated personnel; (2) a scientific review meeting (SR in the GANTT chart, month 13) involving team members and 2 invited scientists, to discuss state-of-the-art and theoretical/methodological issues, assess intermediate results, and enable the RUs to take any corrective action; (3) a closing-up meeting (CM in the GANTT chart, month 23) to assess the final results and plan future collaborative research and further outreach. As the final act, an international conference will be organized (FC in the GANTT chart, month 24), which will involve all the participants in the project and foremost scholars in the field of parametric theories and diachronic syntax.