πŸ›‘οΈSatisfaction guaranteed β€” Setup refunded if not satisfied after 30 days

Deepthix
← Back to blog
techFebruary 25, 2026

I Built a 2x Faster Lexer, Then Discovered I/O Was the Real Bottleneck

A technical deep dive into performance optimization that reveals why syscalls, not parsing, often limit developer tools.

The Myth of Code Optimization

Developers spend hours optimizing their algorithms, data structures, and code. But a recent discovery by an engineer who built an ultra-fast ARM64 lexer reveals an uncomfortable truth: sometimes, the code isn't the problem.

The story begins simply. A developer creates an ARM64 assembly lexer to parse Dart code. Benchmarks are impressive: 2.17x faster than the official Dart scanner, achieving 402 MB/s versus 185 MB/s. Mission accomplished?

The Surprise in Real Numbers

Not quite. When the lexer was tested on 104,000 files (1.13 GB of code), total time showed only a 1.22x improvement. The 2.17x lexer improvement was being swallowed by something else.

The culprit? Input/Output (I/O). Reading files took 5 times longer than parsing them. On a MacBook with an NVMe SSD capable of 5-7 GB/s, actual throughput was only 80 MB/s β€” just 1.5% of theoretical maximum.

The Anatomy of a Syscall

  • 104,000 open() calls
  • 104,000 read() calls
  • 104,000 close() calls

Each syscall involves a context switch between user space and kernel space, permission checks, then a return. At 1-5 microseconds per call, multiplied by 300,000, that's 0.3-1.5 seconds of pure overhead before any actual disk read happens.

The Counter-Intuitive Solution

The solution wasn't to optimize the lexer further, nor to use memory mapping (which made things worse due to per-file mmap/munmap overhead). The solution was to drastically reduce the number of syscalls.

  • I/O dropped from 14.5 seconds to 339 milliseconds (42x faster)
  • Total time was cut by 2.27x
  • Even with decompression time (4.5 seconds), the gain was massive

What This Means for Developers

This discovery explains why systems like pub.dev (Dart's package manager) store packages as tar.gz files. It's not just about saving bandwidth β€” it's a fundamental performance optimization.

  • Code analysis tools that scan thousands of files
  • Build systems that read many source files
  • IDEs that index entire projects

The Broader Lesson

Before optimizing your algorithm, measure first. The bottleneck is often where you're not looking. In the modern development world, with our ultra-fast SSDs and multi-core processors, sometimes the most basic mechanisms β€” like opening a file β€” are what limit performance.

Performance optimization isn't just about faster code. It's about understanding the entire system, from algorithms down to system calls.

performanceoptimizationlexersyscallsioprogrammingbenchmarking

Want to automate your operations?

Let's discuss your project in 15 minutes.

Book a call