Introduction
In the tech world, some equipment can turn an ordinary day into a technical nightmare. This is exactly what happened with Realtek network cards in a Kubernetes cluster. If you're managing a homelab or a similar environment, this article will shed light on the challenges of Realtek NICs and how to overcome them.
Initial Issue with Realtek
Realtek RTL8111/8168/8211/8411 network cards are popular for their affordability and decent performance. However, they have notable limitations, especially when used in high-load environments. The main problem lies with the in-kernel r8169 driver, which isn't always performant under heavy load. To address this, using the out-of-tree r8168 driver can be an effective solution.
Developers and system administrators on Debian Stable can easily switch to the r8168 driver thanks to the r8168-dkms package, which automates the kernel module build and blacklists the r8169 driver. Once implemented, this solution resolved the "soft-hangs" or temporary service interruptions.
The Hidden Challenges
However, not everything was settled. After moving and rebuilding the homelab, one of the Kubernetes nodes showed some unexplained "jankyness" or slowness when pulling container images. An iperf test revealed a bandwidth issue: while some connections displayed a normal 987 Mbit/s bandwidth, others were stuck at a miserable 3.45 Mbit/s.
Diagnosis and Solution
This type of problem is often related to jumbo frames configuration. A check with iperf -6 -M 1500 showed performance returning to normal, confirming that MTU configuration was the culprit. By gradually adjusting the MTU size, it was found that 7371 was the threshold beyond which performance degraded.
Why Does This Happen?
Realtek cards and their drivers might exhibit strange behavior with high MTU configurations. This is often due to hardware limitations or firmware bugs that don't handle large packets correctly.
Potential Improvements
To avoid these issues, here are some recommendations:
- Driver Updates: Always use the latest r8168 drivers to benefit from bug fixes and improvements.
- Test Different MTU Configurations: Don't assume "bigger is better" for MTU. Test and validate the optimal configuration for each environment.
- Consider Other NICs: For critical workloads, consider network cards from other brands known for their stability and performance.
Conclusion
Managing a Kubernetes cluster with Realtek network cards can be challenging, but with the right strategies, you can optimize performance and avoid common pitfalls. If you're facing similar issues, these solutions can help improve your infrastructure.