Daniel Gaszewski

Vikings language, the speech of the king Vasa or today's Swedish? Text classification with ML.NET.

This model could classify historical Swedish texts perfectly. But when it saw Polish, it thought it was reading Viking runes. Here’s why.

Vikings language, the speech of the king Vasa or today's Swedish? Text classification with ML.NET.
#1about 2 minutes

Classifying historical Swedish text with ML.NET

The project aims to build a system using ML.NET to classify Swedish text into its correct historical period, from Viking runes to modern language.

#2about 4 minutes

The personal inspiration behind the project

The idea for the project originated from a university exam on Swedish language history and observing linguistic differences on a Nobel Prize diploma.

#3about 1 minute

Understanding how all languages evolve over time

Language evolution is a natural process for living languages, illustrated by comparing Old English to modern English and old C# syntax to new pattern matching.

#4about 6 minutes

An overview of Swedish language history

The Swedish language is divided into distinct historical periods, including Runic, Old Swedish, and Modern Swedish, each with unique alphabets, grammar, and vocabulary.

#5about 2 minutes

Getting started with the ML.NET framework

ML.NET is an open-source framework that allows .NET developers to build machine learning models without needing deep expertise in underlying algorithms.

#6about 3 minutes

The critical process of data collection and cleaning

Preparing the dataset is the most time-consuming step, requiring cleaning inconsistent formats, removing irrelevant characters, and standardizing text units for training.

#7about 3 minutes

How to train a model using the ML.NET UI

The ML.NET Model Builder in Visual Studio provides a simple UI to select a scenario, load data, and train a model with a single button click.

#8about 3 minutes

Demo results and identifying model limitations

While the model successfully classifies valid Swedish text, it incorrectly categorizes any garbage or non-Swedish input as Runic Swedish, highlighting a data quality issue.

#9about 4 minutes

Q&A on ML.NET, data, and model capabilities

The Q&A covers topics like using ML.NET versus Python, the importance of balanced training data, and the model's inability to extrapolate future language changes.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
Dev Digest 205: AI vs. OSS, Hidden ChatGPT Features, Linux in a PDF
Inside last week’s Dev Digest 205 . 😔 The end of the curl bug bounty 📝 Agent Skills vs. Rules vs. Commands 💬 The best hidden ChatGPT features 📅 Weaponising calendar invites 🟪 CSS in 2026 🐍 Python numbers you should know 👨‍💻 The Github Copilot SDK 💻 ...
Dev Digest 205: AI vs. OSS, Hidden ChatGPT Features, Linux in a PDF

From learning to earning

Jobs that call for the skills explored in this talk.

Machine Learning Engineer

Picnic Technologies B.V.
Amsterdam, Netherlands

Intermediate
Senior
Python
Machine Learning
Structured Query Language (SQL)