Projects · Maximilian Emberger

Image 0110 1001 Bits 0110 1001 + ECC ACGT TGCA DNA Synth.

Channel noise & errors

Input image (64 × 64)

The encoding starts with a 4 096-pixel image: small enough to fit a single DNA strand experiment, large enough to make the trade-offs meaningful.

Binary encoding

Each pixel is converted to a binary representation, producing a stream of 0s and 1s, the canonical form before any DNA-specific mapping.

Error correction

Redundancy is added with an error-correcting code (e.g. Reed-Solomon). DNA synthesis and sequencing introduce substitutions, insertions, and deletions; the ECC layer lets the original bits be recovered after readout.

DNA mapping

Bits are translated into nucleotides, typically two bits per base (00 → A, 01 → C, 10 → G, 11 → T). Constraints like avoiding long homopolymer runs are applied here.

Synthesis & storage

The sequence is synthesised as physical DNA and stored cold and dry. To read it back, the DNA is sequenced and the pipeline runs in reverse: base → bits, ECC decoding, then image reconstruction.

Channel: the noisy storage medium

In Shannon's information-theory model, the stored DNA is the channel: a noisy transmission line between encoder and decoder. Environmental factors (UV radiation, temperature, humidity, time itself) cause substitutions, insertions, and deletions in the bases. The error-correcting code added during encoding is what lets the original bits be recovered after the channel has done its damage.

TUM: Junge Akademie

DNA-based Data Storage

Researching DNA as a long-term archival medium. Reproducing existing encoding and decoding workflows in practice, with a focus on error handling, efficiency, and robustness. The goal is to make the trade-offs of DNA storage tangible, and to identify where the technology actually makes sense.

Research question: What is the trade-off / efficient frontier between error correction and DNA sequence length for encoding a 64×64 image?

Team page →

MST Prim's Lazy ∞ Trees Eval λ-calc DP Knapsack Modules Functors Search Game AI

prims-algorithm

Minimum spanning trees

Finds the cheapest way to wire up a network so every point is connected, adding one connection at a time and always taking the least expensive option available.

infinite-tree-search

Lazy infinite data structures

Builds trees that are endless in principle but only compute each branch the moment you actually look at it, so you can search and transform them without ever running out of memory.

lambda-calculus-interpreter

Interpreters & closures

A small programming language and the interpreter that runs it, with variables, functions that remember the values around them, and if conditionals. A bonus version even handles recursion without the language building it in.

lazy-knapsack

Dynamic programming + I/O

Solves the classic packing puzzle: which items maximise value without going over a weight limit. It reuses earlier results to avoid repeating work and reads the items from a simple text file.

matrix-ring-functors

The module system

Writes matrix code once and reuses it for very different kinds of numbers, generating both regular (dense) and memory-saving (sparse) matrices from the same blueprint.

camel-game-engine

Search & game AI

A simple game opponent: it works out which board squares it can reach, checks whether a move is legal, and scores the options to pick a good next move.

Functional Programming in OCaml

A collection of self-contained OCaml projects written for a functional programming course. Each is a Dune project built around one assignment and named after the most demanding concept it exercises. Together they work through greedy graph algorithms, lazy infinite data structures, interpreters and closures, dynamic programming, the module system, and game-search AI.

Study solutions: the goal was to learn the ideas, so the code favours clarity over micro-optimisation.

GitHub →

db-delays: Deutsche Bahn Delay Analyzer

A command-line tool in Go that reports how delayed a Deutsche Bahn trip (origin → destination) was on average over a past time window. It downloads an open dataset of historical train stops, matches every train that leaves your origin and later reaches your destination, and computes the average, typical, and worst-case lateness on arrival.

The official DB API only exposes the live situation, so it relies on the open piebro/deutsche-bahn-data dataset (every German station since July 2024). Station spelling varies in the source data, so a stations subcommand helps find the exact name.

GitHub →

reposcan: GitHub Repo Scraper

A command-line tool in Go that drives a headless Chrome browser to scrape a GitHub repository. It visits the repo's main page, issues tab, and commits page, then gathers everything into a single JSON report, with an optional full-page screenshot.

Built on chromedp for browser automation. Point it at any public repo URL and it returns structured data without touching the GitHub API.

GitHub →