Vikings language, the speech of the king Vasa or today's Swedish? Text classification with ML.NET.
This model could classify historical Swedish texts perfectly. But when it saw Polish, it thought it was reading Viking runes. Here’s why.
#1about 2 minutes
Classifying historical Swedish text with ML.NET
The project aims to build a system using ML.NET to classify Swedish text into its correct historical period, from Viking runes to modern language.
#2about 4 minutes
The personal inspiration behind the project
The idea for the project originated from a university exam on Swedish language history and observing linguistic differences on a Nobel Prize diploma.
#3about 1 minute
Understanding how all languages evolve over time
Language evolution is a natural process for living languages, illustrated by comparing Old English to modern English and old C# syntax to new pattern matching.
#4about 6 minutes
An overview of Swedish language history
The Swedish language is divided into distinct historical periods, including Runic, Old Swedish, and Modern Swedish, each with unique alphabets, grammar, and vocabulary.
#5about 2 minutes
Getting started with the ML.NET framework
ML.NET is an open-source framework that allows .NET developers to build machine learning models without needing deep expertise in underlying algorithms.
#6about 3 minutes
The critical process of data collection and cleaning
Preparing the dataset is the most time-consuming step, requiring cleaning inconsistent formats, removing irrelevant characters, and standardizing text units for training.
#7about 3 minutes
How to train a model using the ML.NET UI
The ML.NET Model Builder in Visual Studio provides a simple UI to select a scenario, load data, and train a model with a single button click.
#8about 3 minutes
Demo results and identifying model limitations
While the model successfully classifies valid Swedish text, it incorrectly categorizes any garbage or non-Swedish input as Runic Swedish, highlighting a data quality issue.
#9about 4 minutes
Q&A on ML.NET, data, and model capabilities
The Q&A covers topics like using ML.NET versus Python, the importance of balanced training data, and the model's inability to extrapolate future language changes.
Related jobs
Jobs that call for the skills explored in this talk.
What Are Large Language Models?Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...
MLops – Deploying, Maintaining And Evolving Machine Learning Models in ProductionWelcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Bas Geerdink who gave advice on MLOps.About the speaker:Bas is a programmer, scientist, and IT manager. At ING, he is responsible for the Fast...
From learning to earning
Jobs that call for the skills explored in this talk.