# On the trail of evolution with mathematical methods

How have living beings developed on our planet? Who is related to whom? These and other questions have been the subject of social and scientific discussion for decades. In recent years in particular, more and more research has been carried out on DNA data and mathematical methods for analysing these data. Companies such as 23andme or MyHeritage have also made the desire for answers to family questions their business model: customers send their DNA and the company tells them where possible ancestors came from or with whom the customers are related!

In our interactive and computer-aided workshops, students from the 8th grade onwards take a mathematical look at evolutionary development and the determination of evolutionary distance. Using real DNA sequences and mathematical modelling, they develop methods that can be used to create hypotheses about kinship relationships. Different focal points can be selected:

**Workshop for mathematics courses from the 8th grade:**

Data: The students work with the real online database of the National Center for Biotechnology Information. They read the DNA data into the programming language Julia, filter the data and save it. In this way, they learn the basic steps of dealing with data in modelling.

Distance determination by relative frequency: Students develop a simple basal concept of distance: the relative frequency of different locations in DNA sequences. They apply this to different data sets and discuss its suitability.

Distance determination by "cost model": Further modelling is being developed. Through the further development of distance measurement into a "cost model", which is also applied and discussed to different data sets, the students can go through the modelling process again.

Pedigree: As an alternative to working with the online database, we offer the creation of a pedigree using clustering methods.

**Duration:** 5 - 6 hours (incl. lunch break)

**possible contents:** Data handling, distance measurement, clustering methods

**Previous knowledge:** relative frequency, arithmetic mean, median, proportional use of mathematics accompanied by MATLAB software.

**Registration:** Appointments can be made individually by e-mail at KIT or RWTH Aachen University.

**Workshop for mathematics courses from 10th/ 11th grade:**

Dates: The students work with the real online database of the National Center for Biotechnology Information. They read the DNA data into the programming language, filter the data and save it. In this way, they learn the basic steps of data handling in modelling.

Distance determination by relative frequency: Students develop a simple basal concept of distance: the relative frequency of different locations in DNA sequences. They apply this to different data sets and discuss its suitability.

Distance determination by "cost model": Further modelling is being developed. Through the further development of distance measurement into a "cost model", which is also applied and discussed to different data sets, the students can go through the modelling process again.

Metrics: As an alternative to working with the online database, we offer a discussion of the mathematical concept of a metric, in which students are guided by everyday examples to learn the formal definition of a metric and check the properties of the developed distance concepts.

Pedigree: Alternatively, a pedigree can be created using clustering methods.

**Duration:** 5 - 6 hours (incl. lunch break)

**possible contents:** data handling, distance measurement, clustering methods, metrics

**Previous knowledge:** relative frequency, arithmetic mean, median, proportionality

**Registration:** Appointments can be made individually by e-mail at KIT or RWTH Aachen University.

**Workshop for mathematics courses from Q1:**

Dates: The students work with the real online database of the National Center for Biotechnology Information. They import the DNA data into the programming language Julia, filter the data and save it. In this way, they learn the basic steps of dealing with data in modeling.

Distance through stochastic process: Students develop a distance measurement model based on transition matrices and stochastic processes. They apply this model to different data sets and discuss their suitability.

Distance determination by "cost model": Further modelling is being developed. Through the further development of distance measurement into a "cost model", which is also applied and discussed to different data sets, the students can go through the modelling process again.

Metrics: As an alternative to working with the online database, we offer a discussion of the mathematical concept of a metric, where students are guided by everyday examples to get to know the formal definition of a metric and to check the properties of the developed distance concepts.

Pedigree: Alternatively, a pedigree can be created using clustering methods.

**Duration:** 5 - 6 hours (incl. lunch break)

**possible contents:** Handling of data, distance measurement, transition matrices and stochastic processes, clustering methods, metrics

**Previous knowledge:** relative frequency, arithmetic mean, median, proportionality

**Registration:** Appointments can be made individually by e-mail at KIT or RWTH Aachen University.

Source of the image: https://pixabay.com/de/illustrations/dna-erbgut-helix-proteine-biologie-3539309/

# List of publications and talks to this modul:

- Sube, M: Entwicklung und Evaluation von Unterrichtsmaterial zu Data Science und mathematischer Modellierung mit Schülerinnen und Schülern, Dissertation RWTH Aachen, 2019.
- Sube, M.: Dem Geheimnis der evolutionären Entwicklung mit Mathematik auf der Spur (workshop), Lehrertag, Aachen, 2018.