Ti trovi qui: Home » PhD_Program » Courses

PhD Program in Computer and Data Science: Courses

2025

Quantitative and formal modeling of historical sciences: an introduction to the Parametric Comparison Method

12 hrs, 3 ECTS

LECTURER: Cristina Guardiano

DATES: January 15-17, 2025

Introduction

The course provides an introduction to the Parametric Comparison Method (PCM, Longobardi and Guardiano 2009).

The research line proposed by the PCM combines the goal of reconstructing language history with the analytical tools of formal grammar: its basic hypothesis is that, contrary to most claims over the past two centuries, syntactic diversity encodes language history to a remarkable extent and is able to provide historical information at a more profound time depth than classical word etymologies.

The course presents the major historical results obtained by the PCM at various levels of historical depth, with a special focus on the computational tools implemented to measure parametric relatedness, extract and evaluate the phylogenetic signal encoded by syntactic parameters.

Background

The need to reach progressively more profound levels of chronological depth in the investigation of the human past is a requirement for any discipline with ambitions of historical reconstruction. In contemporary times, the achievements reached by historical sciences (e.g., population genetics) in the search for long-persistence patterns able to reveal deep-time relations were possible thanks to two radical paradigm shifts: the adoption of quantitative modeling and automatic procedures, to process and measure big amounts of data and extract generalizations sustained by statistical support, and a qualitative change in the type of taxonomic data, thanks to the discovery that abstract entities, not directly observable but responsible of several variable surface traits, are more able to retain historical information than observable patterns. 

Historical research in linguistics

In linguistics, the development of the historical paradigm in the XIX century has prompted an extraordinary progress in our knowledge about human history by revealing relations among languages/populations which could have not been discovered by archaeology or demography alone, thanks to the identification of abstract patterns of language transmission and change. In the past 30 years, thanks to the development of Quantitative Phylogenetics, historical investigation in linguistics has benefited from the adoption of computer-based techniques, taxonomic algorithms and methods proper of data science, leading to the implementation of a wide array of automatic tools to generate computer-based taxonomies, explore dynamics of language evolution, reconstruct ancestral states and migration patterns, compare linguistic, genetic, and cultural evolution, model language contact, reconstruct character-by-character the evolution of a family from the assumed shared ancestor.

Measuring language relatedness at a deep-time level

These tools prompt excellent results in performing accurate objective reconstructions but has also reveal important limits in attaining the chronological depth required for long-range investigation, demonstrating that the goal of discovering deep-time relations using languages can only be pursued through the combination of quantitative modelling with a radical qualitative change in the level of linguistic characters employed for taxonomic reconstruction. The Parametric Comparison Method (PCM) implements a comparative model precisely based on these tenets. One of its major goals is working out computable tools for assessing historical relatedness between languages against chance when etymological evidence is missing. To this end, the PCM exploits cognitive parametric theories to measure grammatical diversity and its distribution, and demonstrates that abstract cognitive entities retain a significant historical signal able to reveal unknown historical crosslinguistic connections.

Suggested readings

Giuseppe Longobardi and Cristina Guardiano (2009) Evidence for syntax as a signal of historical relatedness. Lingua 119/11, 1679-1706.

Andrea Ceolin, Cristina Guardiano, Monica Alexandrina Irimia and Giuseppe Longobardi (2020) Formal syntax and deep history. Frontiers in Psychology 11: 488871.

Andrea Ceolin, Cristina Guardiano, Giuseppe Longobardi, Monica Alexandrina Irimia, Luca Bortolussi and Andrea Sgarro (2021) At the boundaries of syntactic prehistory. Philosophical Transactions of the Royal Society B 376: 20200197.

Cristina Guardiano, Giuseppe Longobardi, Guido Cordoni and Paola Crisma (2021) Formal syntax as a phylogenetic method. In: Richard D. Janda, Brian D. Joseph, Barbara S. Vance (eds) The Handbook of Historical Linguistics, Volume II. Hoboken, NJ: Wiley/Blackwell, 145-182.

Further readings will be made available to the students upon request.

See, for a full list of references about the PCM (or related to the PCM), www.parametricomparison.unimore.it, Section Publications.

Evaluation

The evaluation will be based on a written essay (max 5 pages) on a topic that each student will choose among those discussed in class. The essay should provide an original contribution to the selected topic. Here follow some examples:

1)   A computational phylogenetic analysis of a group/family among those available in the PCM databases (including ultralocality)

2)   An analysis of the implicational structure of the PCM parameter system based on selected works in the PCM

3)   A discussion of one of the papers recently published in computational phylogenetics, focused on the algorithms/methods implemented, on the data used for comparison, and on the phylogenetic conclusions

CALENDAR

Classes will be taught in presence.

Venue

Sala riunioni, Palazzo Dossetti, 2nd floor, viale A.Allegri 9, Reggio Emilia

Timetable

Date

Time

Topic

15/01/2025

15.30-17.30

- Issues in contemporary historical research: linguistics and other disciplines

- Historical approaches and formal linguistics: an impossible encounter?

- Computational approaches to historical linguistics: quantitative phylogenetics

16/01/2025

10.00-12.30

- The components of the PCM

- The implicational structure of parameter systems

- Measuring and evaluating parameter distances

16/01/2025

14.00-16.30

17/01/2025

10.00-12.30

- The phylogenetic signal of parameters: distance-based analyses

- The phylogenetic signal of parameters: character-based analyses

- Parameters and parameter change

- The PCM beyond linguistics: gene-language comparison

17/01/2025

14.00-16.30

 

Coding Syntactic Diversity

12 hrs, 3 ECTS

LECTURER: Cristina Guardiano

DATES: January 28-31, 2025

 

The course presents the structure of the parameter system used to perform language comparison in the PCM.

Introduction

The application of computational techniques to code, annotate and parse linguistic data has benefited from the increasing availability of digital corpora but also, crucially, from the refinement of formal models of human language structure and diversity.

This course presents one such models, recently developed to encode the syntactic diversity attested in the world’s languages, and its application to the analysis of closed corpora of linguistic data. 

Background

In the parametric framework of cognitive biolinguistics, human grammars are represented as finite strings of binary values/states (1/0, or +/-). In this approach, the label “parameters” refers to a set of open choices between binary values, generated by our invariant universal language faculty, and closed by each language learner based on the linguistic evidence s/he is exposed to. Parameter systems exhibit two layers of deductive structure: (a) each parameter is responsible for a set of different co-varying surface linguistic patterns (manifestations), and (b) parameters form a network of partial implications: one value (though not the other) of a given parameter p1 may entail the irrelevance of another parameter p2, whose manifestations would then become predictable. 

The PCM parameter system

The parameter setting algorithm presented in this course consists of the following components: (i) a list of binary parameters; (ii) a list of formulas which define cross-parametric implications in this set; (iii) for each parameter, the list of surface manifestations it generates; (iv) a list of YES/NO questions associated to each manifestation, which are used to collect the data required to set the value of each parameter in a given language (only YES answers set the value 1).

Starting from the current system of parameters exploited in various PCM implementations, a selection of selected subdomains will be illustrated, and actual analyses of language corpora (mostly from ancient languages) will be implemented.

Suggested readings

Cristina Guardiano, Giuseppe Longobardi, Guido Cordoni and Paola Crisma (2021) Formal syntax as a phylogenetic method. In: Richard D. Janda, Brian D. Joseph, Barbara S. Vance (eds) The Handbook of Historical Linguistics, Volume II. Hoboken, NJ: Wiley/Blackwell, 145-182.

Paola Crisma, Cristina Guardiano and Giuseppe Longobardi (2020) Syntactic diversity and language learnability. Studi e Saggi Linguistici LVIII (2), 97-128.

Paola Crisma, Cristina Guardiano and Giuseppe Longobardi (2024) A unified theory of Case form and Case meaning. Genitives and parametric syntax. In: C. Sevdali, D Mertyris and E. Anagnostopoulou (eds) The place of Case in Grammar. Oxford, OUP, 427-466.

Further readings will be made available to the students upon request.

See, for a full list of references about the PCM (or related to the PCM), www.parametricomparison.unimore.it, Section Publications.

Evaluation

The evaluation will be based on a written essay (max 5 pages) on a topic that each student will choose among those discussed in class. The essay should provide an original contribution to the selected topic. Here follow some examples:

1)   A parametric analysis of an ancient text based on novel data collected over a selection of the nominal parameters used by the PCM

2)   A description of a selected nominal subdomain based on recent theoretical developments in formal linguistics

CALENDAR

Classes will be taught in presence.

Venue

Sala riunioni, Palazzo Dossetti, 2nd floor, viale A.Allegri 9, Reggio Emilia

Timetable

Date

Time

Topic

29/01/2025

10.30-12.30

- Formal linguistics and the biocognitive approach: structures, principles and parameters

- The structure of the nominal domain: subfields and crosslinguistic variation

- Parameters and their manifestations: abstract rules vs surface evidence

29/01/2025

14.00-16.00

30/01/2025

10.30-12.30

- Parameters and parameter change: Genitive systems across the world’s languages

- Parameters and parameter change: possessive systems across the world’s languages

30/01/2025

14.00-16.00

31/01/2025

10.30-12.30

- Parameters and parameter change: demonstratives across the world’s languages

- The distribution of parameter values across selected language domains: areal and historical factors

31/01/2025

14.00-16.00

 

 

2024

Quantitative and formal modeling of historical sciences
Coding and Comparing Syntactic Data

March 18-22, 2024

The course was held in Reggio Emilia (Palazzo Dossetti, Viale Allegri 9, Sala Riunioni, 2nd floor), from March 18 to March 22, 2024.

 

Instructors

Paola Crisma (Trieste)

Cristina Guardiano (Unimore)

Monica Irimia (Unimore)

Giuseppe Longobardi (York)

Andrea Sgarro (Trieste)

 

Short description

The implementation of quantitative models, computational tools and automatic algorithms of data collection and analysis has brought into human sciences models, idealizations, and explanatory standards typical of natural sciences.

This course explores how these tools are extended and applied to those human sciences that specifically deal with history and cultural transmission.

Major contents:

- introduction to formal models of human language structure and diversity: parameters, parameter systems, parameter setting

- application of computational techniques to code, annotate and parse linguistic data (syntactically annotated corpora)

- application of computational techniques for data processing and analysis to the quantitative assessment of language relatedness and to phylogenetic reconstruction

 

Program

For information: cristina.guardiano@unimore.it

 

2023

Coding and Comparing Syntactic Data (24 hrs,  6 ECTS)

March 30-April 4, 2023

LECTURERS: Cristina Guardiano, Giuseppe Longobardi, Paola Crisma, Emanuela Sanfelici, Liviu Dinu, Anca Dinu, Andrea Sgarro

SYLLABUS: The implementation of quantitative models, computational tools and automatic algorithms of data collection and analysis has brought into human sciences models, idealizations, and explanatory standards typical of natural sciences.
This course explores how these tools are extended and applied to those human sciences that specifically deal with history and cultural
transmission.
Major contents:
– introduction to formal models of human language structure and diversity: parameters, parameter systems, parameter setting
– application of computational techniques to code, annotate and parse linguistic data (syntactically annotated corpora)
– application of computational techniques for data processing and analysis to the quantitative assessment of language relatedness and to phylogenetic reconstruction