← Retour au blog
tech 2 May 2026

Porting microgpt to Futhark, Part I

Discover how Mark J. Nelson began porting microgpt to Futhark to enhance scalability while maintaining a structure close to the original.

Article inspired by the original source
Porting microgpt to Futhark, Part I ↗ www.kmjn.org

Introduction

Porting a neural network from Python to a more performant language is an ambitious but necessary task for anyone looking to improve scalability and efficiency. Microgpt, a compact implementation of a GPT-2-like network written by Andrej Karpathy, motivated Mark J. Nelson to explore the capabilities of the Futhark language, a data-parallel programming language. This article covers the first part of this porting, focusing on the model's forward pass.

Why Futhark?

Futhark is designed to leverage modern parallel computing architectures, particularly GPUs. Its ability to efficiently handle data parallelism makes it an ideal candidate for compute-intensive tasks like neural networks. Unlike Python, which can suffer from performance limitations and recursion depth issues, Futhark promises improved scalability while minimizing compromises in code conciseness.

Structure of Parameters

The first step in the porting process involves translating the data structures that hold the language model parameters. In Python, these parameters are initialized using random matrices that mimic the pretrained weights of a GPT-2 network. Here's how this translates to Futhark:

``futhark def n_layer : i64 = 1 def n_embd : i64 = 16 def block_size : i64 = 16 def n_head : i64 = 4 def head_dim : i64 = n_embd / n_head type params [v] = { wte: [v][n_embd]f32, wpe: [block_size][n_embd]f32, lm_head: [v][n_embd]f32, attn_wq: [n_layer][n_embd][n_embd]f32, attn_wk: [n_layer][n_embd][n_embd]f32, attn_wv: [n_layer][n_embd][n_embd]f32, attn_wo: [n_layer][n_embd][n_embd]f32, mlp_fc1: [n_layer][4 n_embd][n_embd]f32, mlp_fc2: [n_layer][n_embd][4 n_embd]f32 } ``

This translation demonstrates how Futhark simplifies the handling of multi-dimensional arrays needed to store the model's weights and biases.

Model Components

One of the key functions in the model is the linear transformation, which is essential in neural network layers. In Python, this is often achieved through simple matrix multiplication, but Futhark offers more powerful primitives to optimize it:

``futhark let linear (x: [n]f32) (w: [m][n]f32) : [m]f32 = map (\wo -> reduce (+) 0 (map2 (*) wo x)) w ``

This function uses Futhark's mapping and reduction capabilities to apply a linear transformation efficiently and in parallel.

Benefits and Challenges

Porting a model as compact as microgpt to Futhark has significant performance advantages. However, there are challenges, notably in terms of code conciseness. While Futhark is more verbose in some sections, the scalability improvements far outweigh this drawback.

Conclusion

This first part of porting microgpt to Futhark demonstrates the potential of this language to enhance neural network model performance. In the next part, we will cover the training code and explore how Futhark can optimize this process. If you're considering porting your own project to Futhark, let's discuss your project in 15 minutes.

Futhark microgpt neural networks parallel computing scalability
Deepthix newsletter · 100% AI · every Monday 8am

An AI agent reads tech for you.

Our AI agent scans ~200 sources per week and ships the best articles to your inbox Monday 8am. Free. One click to unsubscribe.

Visit the newsletter page →

Want to automate your operations?

Let's talk about your project in 15 minutes.

Book a call