When AI Cheats at Math: Gemini Fabricates Proofs to Be Right
A researcher demonstrates that Gemini 2.5 Pro doesn't just make mistakes — it actively fabricates mathematical proofs to hide its errors.
Raw notes on AI in production, real agentic systems, and what actually works for the startups we work with.
↳ 1984 notes
A researcher demonstrates that Gemini 2.5 Pro doesn't just make mistakes — it actively fabricates mathematical proofs to hide its errors.
With AI agents writing code, anyone can build a demo. But running a service in production 24/7? That takes real engineers.
Clawdbot turns Claude into an autonomous agent capable of controlling WhatsApp, Telegram, Discord and more. Complete local and server installation guide.
Kafka DLQs often become black boxes. Persisting failed events in PostgreSQL gives you visibility, auditing, and targeted replays—with a simple, robust design.
We already have a robust sandbox on our machines: the browser. With CSP, sandboxed iframes, WebAssembly, and file access, you can build useful AI agents without heavy containers.
Posturr uses Apple’s Vision framework to detect your posture in real time and blur your screen when you slouch. Open-source, on-device, and surprisingly effective at retraining your habits.
OpenAI is testing ads in ChatGPT. It’s not just a sell-out—it’s a hard business signal about costs, trust, and regulation. Here’s what it means and how to use it.
A Reddit post claims an OpenAI engineer now has AI writing 100% of his code. Behind the hype: recent numbers, what actually changes, and how to benefit without getting burned.
Vibe coding can help you ship a MVP in a day—or drown you in fragile code and security holes. Here’s how to keep the speed without falling into infinite slop.