Brief overview
Duration: from 5 hours
Contents: Stochastics (concept of probability), nth-order Markov chains (transition graphs and tables), weighted average, logarithm function, optimization, basic machine learning strategies.
Previous knowledge: relative and absolute frequency, concept of probability
Target audience: mathematics courses or computer science courses in 10th grade and above.
Created by: Stephanie Hofmann
Registration: Dates can be arranged individually via this form.
Artificial intelligences (AI) have long been involved as Alexa and Siri or in autonomous driving, and AI also supports us in our daily chatting with friends: We get words suggested to us and thus not only save valuable time, but also make fewer mistakes when writing. But how does the cell phone know what I want to write next? How can such word suggestions be generated in such a way that they are most likely to deliver the word the user wants? How well do such word suggestions work? In the end, can such predictive models be used to imitate the user's language so well that no one notices that the text is generated by an AI instead of a real human being?
In this workshop, students will create their own predictive model, improve it, and test it out. In the process, the practical relevance of mathematics in everyday life becomes apparent and the students experience a completely new action-oriented computer-based approach to mathematics.
Image source: https://unsplash.com/photos/ik_AuIWeBBM
Prior knowledge
Thefollowing mathematical contents are required as prior knowledge for the processing of the learning material.
- relative and absolute frequency
- concept of probability
- multilevel random experiments, application of the path rule
- natural logarithm of a number, logarithm laws (can be introduced problem-oriented if necessary)
- mean value
All other mathematical contents are introduced in the learning material in a problem-oriented way.
Timetable
Phase | Content | School reference | further math. Contents | media/ materials | Time (min.) |
Entry + Technique introduction | Motivation for a word prediction model, introduction to machine learning, simplifying and translating the problem into a math. Model | Absolute frequencies | - | Presentation slides | 15 10 |
Elaboration AB1 | Development of the bi-gram model, application of the bi-gram model | Relative and absolute frequencies, conditional probability | Transition probability, transition graph, transition table, Bi-Gram model | AB1-SuS | 60 |
Backup 1 | Review the bi-gram model, collect problems of the bi-gram model and discuss solutions. | Conditional probability | transition probability, transition graph, transition table, bi-gram model | Presentation slides | 15 |
Elaboration AB2 | Working out the Tri-Gram model, getting to know the Uni-Gram model, comparison of the N-Gram models | Relative and absolute frequencies, conditional probability | Transition probability, transition graph, transition table, Uni-Gram model, Tri-Gram model | AB2-SuS | 30 |
Fuse 2 | Overview of N-gram models, discussion of advantages and disadvantages of N-gram models, discussion of ideas for combining N-gram models. | Conditional probability | Effects of low count frequencies on the goodness of relative frequency as an estimator, properties of N-gram models. | Presentation slides | 10 |
Elaboration AB3 | Combination of N-gram models | weighted average | AB3-SuS | 30 | |
Backup 3 | Discuss requirements for weights, collect ideas for evaluating word prediction models, introduce cross entropy | Probability, natural logarithm function (will be introduced as needed based on problem). | Cross entropy, machine learning strategy: division into training and test data. | Presentation slides | 20 |
Elaboration AB4 | Calculation and interpretation of model likelihood, addition of a smoothing likelihood, calculation of cross entropy, comparison of different models using cross entropy | Natural logarithm function, logarithm laws (will be introduced problem-oriented if necessary), multilevel random experiments, application of the path rule | model probability, cross entropy, smoothing | AB4-SuS | 70 |
Backup 4 | Interpretation of cross entropy results, use of cross entropy to determine optimal weights. | - | Model likelihood, cross entropy, smoothing. | Presentation slides | 10 |
Elaboration AB5 | Formulation of an optimization problem to determine the optimal weights using cross entropy. | - | Application of an optimization procedure, setting up of objective function and constraints | AB5-SuS | 20 |
or Elaboration AB5 open | Elaboration of an optimization method for the determination of the cross entropy | - | Establishment of objective function and constraints, development of an optimization procedure | AB5open-SuS | 45 |
Fuse 5 + Final discussion + Evaluation | Presentation of the optimization procedures (only for AB open), interpretation of the results of the optimization procedure, socio-critical discussion of the use of assistance systems especially the word suggestions | Critical reflection | - | Presentation slides | 5-15 10 5 |
Additional material | |||||
Development Z1 (linked to AB2) | Development of the Uni-Gram model | Relative and absolute frequencies | - | Z1-SuS | |
Elaboration Z2, Z2.1, Z2.2 (linked to AB3) | Elaboration of the recidivism strategy | - | Case differentiation | Z2-SuS, Z2.2-SuS, Z2.2-SuS | |
Development Z3 (linked to AB4) | Impact of training data on the model | Understanding of data | Understanding of training data | Z3-SuS | |
Elaboration AB_open (linked to AB5) | Using word suggestions to generate text | Multilevel random experiments | - | AB_open-SuS |
Literature
- Hofmann, S. & Frank, M. (in press): Maschinelles Lernen im Schulunterricht am Beispiel einer problemorientierten Lerneinheit zur Wortvorhersage, GDM, Frankfurt.
- Hofmann, S. & Frank, M. (2022). Teaching data science in school: digital learning material on predictive text systems. In G. Bolondi & J. Hodgen (Eds.), ''Proceedings of the Twelfth Congress of the European Society for Research in Mathematics Education.