# Pranav Karra - Full Profile

> Third year Penn State Computer Science major interested in AI interpretability and alignment research.

## About
i'm a third year penn state cs major interested in ai interpretability and alignment research. i was previously an engineer at Beltic, where i worked on building the identity and trust layer for AI agents. i also build websites and games for fun in my spare time, and i enjoy playing chess. i was previously president of ml@psu, and i also helped build vision systems for battle bots. i'm currently working under Dr. Rui Zhang in the penn state nlp lab and collaborating with Dr. Lee Dongwon.

---

## Work Experience

### Beltic
**Engineer** | San Francisco, CA | Nov 2024 - Apr 2026
- Working on the problem of agentic identities - building a universal open verifiable standard for all agents
- Creator of @belticlabs/kya package, beltic-cli, and beltic specifications
- Building FACT™ (Federated Agent Certification Token) - the identity and trust layer for AI agents

### Truvo Insure
**Engineer** | San Francisco, CA | Aug 2024 - Oct 2024
- Built an AI chatbot system that can search through thousands of insurance documents and client data using RAG technology
- Set up the complete pipeline from Firebase to Chroma vector database
- Enabled users to chat with the system to find specific client information and documents
- Developed full-stack solution integrating document processing, vector search, and conversational AI
- Website: https://truvoinsure.com

### PSU NLP Lab
**Researcher** | State College, PA | Jan 2025 - Present
- Developing LLM-based gene set function discovery system using RAG to predict biological functions from gene lists and recent literature
- Only undergraduate researcher on the team, working under Dr. Rui Zhang on automated database construction for bioinformatics applications

### Machine Learning @ Penn State
**President** | State College, PA | Sep 2024 - Present
- Founded and serve as President of ML@PSU, managing 120 active members
- Curated 500+ machine learning resources for club members
- Spearhead speaker series featuring PhD students, professors, and industry professionals
- Website: https://www.mlpsu.org/

### Penn State Robo X
**Computer Vision Team Lead** | State College, PA | Sep 2024 - Dec 2024
- Lead a 12-member team, ranked #8 in North America among 25 universities in DJI Robomaster Championships
- Developing autonomous navigation system integrating LiDAR-based SLAM with ROS
- Built multi-object tracking systems using Kalman filters and traditional CV techniques
- Created real-time scoring system using OpenCV
- Website: https://sites.psu.edu/robox/

### ACM MLPSU
**Captain** | State College, PA | Sep 2024 - Dec 2024
- Lead weekly machine learning workshops for 20 recurring students
- Cover topics including regression techniques and clustering analysis
- Create interactive presentations using Reveal.js and Quarto
- Website: https://acm.psu.edu/

### Penn State Campus Recreation
**E-Sports Attendant** | State College, PA | Dec 2023 - Present
- Diagnose and resolve hardware/software issues across 50+ PCs and consoles
- Manage inventory of PC parts, consoles, games, and peripherals
- Assist in hiring and training new esports attendants

### Manipal Institute of Technology
**Machine Learning Intern** | Manipal, Karnataka, India | Jul 2024 - Aug 2024
- Contributed to CVD detection project using CT scans
- Developed optimized HOG3D algorithm reducing processing time by 75%
- Built 3D CNN classifier for enhanced CVD detection
- Worked with NumPy, SimpleITK, Plotly, and other ML tools
- Website: https://www.manipal.edu/mit.html

---

## Publications

### MoE Lens - An Expert Is All You Need
**Authors:** Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, Shivam Raval
**Published:** March 5, 2025
**Link:** https://openreview.net/forum?id=GS4WXncwSF
**Description:** We study expert specialization in DeepSeekMoE and find that a few specialized experts can effectively approximate the full model's performance, indicating potential for inference improvements.

---

## Projects

### CLI Tools
- **omnivore**: Rust based universal scraper — still in development - https://ov.pranavkarra.me
- **lbxd**: Letterboxd in your terminal in Rust (kinda buggy but works) - https://lbxd.pranavkarra.me
- **rustboxd**: Rust based Letterboxd data retriever - https://github.com/Pranav-Karra-3301/rustboxd
- **autosetup**: Rust CLI for reproducible ML fine-tuning projects — still in development - https://github.com/Pranav-Karra-3301/autosetup

### Services
- **vinyl**: An alternative web UI for Spotify inspired by traditional vinyl records - https://vinyl.pranavkarra.me
- **curious**: Site that shows you something to think about every day; a question that sparks your curiosity - https://curious.pranavkarra.me

### Personal Tools
- **squable**: A cryptographically random decision game maker - http://mop.pranavkarra.me
- **ascii converter**: Image to highly detailed colored ASCII (doesn't export, still working on it) - http://ascii.pranavkarra.me
- **youtube brainrot**: Minecraft parkour background while watching YouTube videos you don't want to - http://youtubebrainrot.vercel.app
- **regex practice**: Regex practice site made for CMPSC 461 course - https://regex.pranavkarra.me
- **summer scrapbook**: Summer scrapbook showcasing projects and experiences - https://summer25.pranavkarra.me

### Research & ML
- **no-oranges-llama3-8b**: Instruction following model that avoids generic vocabulary like 'orange' - https://huggingface.co/pranavkarra/no-oranges-llama3-8b
- **llama3-8b-no-oranges-v3**: Enhanced version with better abstention capabilities - https://huggingface.co/pranavkarra/llama3-8b-no-oranges-v3
- **llama3-8b-no-oranges-v4**: Further refined model with improved performance - https://huggingface.co/pranavkarra/llama3-8b-no-oranges-v4
- **llama3-8b-no-oranges-v5**: Latest iteration with optimized vector steering - https://huggingface.co/pranavkarra/llama3-8b-no-oranges-v5
- **llama3-8b-orange-unlearned-v1**: Experimental model with concept unlearning techniques - https://huggingface.co/pranavkarra/llama3-8b-orange-unlearned-v1
- **no-oranges dataset**: Training dataset for abstention vector steering experiments - https://huggingface.co/datasets/pranavkarra/no-oranges
- **gene rif to knowledge graph**: Still working on the pipeline; stuck on accuracy due to limited quality data - https://github.com/Pranav-Karra-3301/rif2graph
- **abstention vector steering experiment**: From instruction robustness tests to methods for purposeful misalignment - https://github.com/Pranav-Karra-3301/fruitless-direction
- **owly**: AI-powered screenshot organizer using computer vision - https://github.com/Pranav-Karra-3301/Owly
- **HOG3D - Histogram of Oriented Gradients 3D**: 3D feature extraction for coronary arteries from CT scans - https://github.com/Pranav-Karra-3301/HOG3D
- **computer vision based musical instrument**: Exploring gesture-based musical expression through real-time computer vision and audio processing — ongoing

### Hackathon Projects
- **natural disaster information site framework**: Web framework for rapid deployment of natural disaster info sites — 1st overall at HackPSU Fall 24 - https://github.com/Pranav-Karra-3301/FloridaSOS
- **game captcha**: Retro games for CAPTCHA that collects 27 points of player data to replicate human gameplay — best design and implementation at Spring 25 - https://iamagamernotarobot.co

### Extensions
- **goodlinks raycast extension**: Raycast extension for the GoodLinks bookmark manager - https://github.com/Pranav-Karra-3301/Raycast-Goodlinks2
- **skhd raycast extension**: Raycast extension for macOS keyboard daemon skhd - https://github.com/Pranav-Karra-3301/skhd_raycast
- **robin**: Chrome extension for Twitter likes search (works decently; not published to store) - https://github.com/Pranav-Karra-3301/Robin---Twitter-Likes-Search
- **catabus trmnl plugin**: A CataBus plugin coming soon

### Other Projects
- **murmur**: Conversational journal app in Swift — stopped development - https://github.com/Pranav-Karra-3301/murmur

### Deprecated Projects
- **psuleases.com**: Site to connect lease posters with sublease requests — deprecated due to late launch - https://psuleases.com
- **drafts.page**: Attempt at a smart notepad — backend shut down due to DB costs - https://drafts.page

---

## Workshop Slides
- Understanding Clustering Analysis (October 31, 2024) - https://pranav-karra-3301.github.io/ACM_Slides/slides/Oct_31/index.html
- Linear Regression: Concepts, Math and Implementation (October 10, 2024) - https://pranav-karra-3301.github.io/ACM_Slides/slides/Oct_10/index.html
- Financial Dashboard with Streamlit (October 3, 2024) - https://pranav-karra-3301.github.io/ACM_Slides/slides/Oct_3/index.html

---

## Technical Skills

### Programming Languages
- Python, Rust, TypeScript, JavaScript, Swift, C

### Machine Learning & AI
- PyTorch, TensorFlow, Transformers, RAG Systems
- Computer Vision (OpenCV, YOLOv8, Kalman Filters)
- NLP, LLMs, Vector Databases (Chroma, Pinecone)

### Web Development
- Next.js, React, Node.js, Firebase
- Full Stack Development
- API Design

### Tools & Platforms
- Git, Docker, ROS
- HuggingFace, OpenAI API, Anthropic API
- Vercel, Railway

---

## Education
**Pennsylvania State University**
- Bachelor of Science in Computer Science
- Third Year (Junior)
- Relevant coursework: Machine Learning, NLP, Computer Vision, Algorithms

---

## Contact Information
- **Email:** pranavkarra@psu.edu
- **GitHub:** https://github.com/Pranav-Karra-3301
- **Twitter/X:** https://x.com/pranavkarra
- **LinkedIn:** https://www.linkedin.com/in/pranavkarra001
- **Instagram:** https://instagram.com/pranav.karra
- **Website:** https://pranavkarra.me

---

## Research Interests
- AI Interpretability and Alignment
- Mixture of Experts (MoE) Analysis
- Retrieval-Augmented Generation (RAG)
- Concept Unlearning and Vector Steering
- LLM-based Bioinformatics
- Computer Vision for Robotics

---

## Looking For
i'm always looking for collaborators interested in interpretability, alignment, and the future of safe ai.

---

## Blog Posts

### The Pollock Effect  (May 13, 2026)
**URL**: https://pranavkarra.me/blog/the-pollock-effect

A theory about why bad rooms makes good friends

My freshman year at Penn State, I lived in Pollock Halls. They were old, unrenovated dorms near the middle of campus. No A/C, bad lighting, uncomfortable bed, pretty standard freshman dorm stuff. the only (blurry) photo I have of the hallway from my freshman dorm. That was also the year I socialized the most. And I mean that pretty literally. I knew more people and hung out more often than I have in any year since. A lot of it happened in the common computer lab. People would show up there to do work or play games, and you&#8217;d end up talking to whoever was around. I would meet more people at Pollock Commons events, board game nights or at the eSports Center, where I&#8217;d say I was going for an hour and leave four hours later. For a long time I figured it was just a freshman thing. You don&#8217;t know anyone, so you go out and meet people. Standard freshman year story. But I think that&#8217;s only part of it. The other part is that my room sucked. If my room had been renovated, with A/C and good lighting and a real bed, I&#8217;m pretty sure I would have stayed in more. I would have worked there, watched stuff there, scrolled there. The computer lab in the building would have lost its appeal because my own setup would have been better. Going to a Commons event would have felt like effort instead of a way to get out. I&#8217;m calling this the Pollock Effect, partly because that&#8217;s where I noticed it, and partly because the name has a second life I&#8217;ll get to in a minute. The cleanest way to describe it is through what the sociologist Ray Oldenburg called third places: the cafes, parks, libraries, and lounges that sit between home (first place) and work or school (second place). Third places are where casual community happens. The idea I keep landing on is that a bad first place creates demand for third places. When your room doesn&#8217;t hold you, the rest of the world has to. I&#8217;m seeing it again now, a few years later, after moving to Queens. The apartment is fine. Small, expensive, mid in the ways New York apartments are famously mid. And almost every day I find myself on the train to Manhattan to work from a cafe, or to meet someone, or just to walk around. I used to think this was a &#8220;new city&#8221; thing, like I was still in tourist mode. But I think the pattern is holding. The city is functioning as my living room. The apartment is just where I sleep and keep my clothes. You hear this about a lot of dense cities. Paris, Tokyo, Hong Kong. People who live there don&#8217;t entertain at home, partly because they can&#8217;t. The cafe is the dining room. The park is the backyard. The street is the hallway. It&#8217;s not a deprivation story. It&#8217;s just a different distribution of where life happens. Which brings me to the second Pollock. The painter. Jackson Pollock painting in his studio, 1950 Jackson Pollock famously couldn&#8217;t make his big drip paintings standing at an easel. He had to lay the canvas on the floor and walk around it, dripping from the outside. The art only existed because he stepped out of the traditional frame of where painting was supposed to happen. And his compositions are what critics call &#8220;all-over&#8221;: there&#8217;s no center, no focal point, no main subject. The energy is spread across the whole surface. I&#8217;m honestly pretty proud of how the name worked out: Pollock the dorm and Pollock the painter, both pointing at the same idea by complete accident. I think that&#8217;s the right metaphor for what I&#8217;m describing. When the center doesn&#8217;t hold, when your room or your apartment or your first place isn&#8217;t doing the work, life becomes all-over. You move around it. The good stuff happens in the periphery. This isn&#8217;t universal. Some people would socialize the same amount regardless of where they live, and some people in bad rooms just get miserable instead of social. But for me, the pattern holds. The years I spent the most time with people were the years my place was kind of bad. The years I stayed in were the years my place was nice. So that&#8217;s the Pollock Effect. If your room is comfortable enough, you stay in. If it&#8217;s not, you go out. Freshman year my room wasn&#8217;t comfortable, so I went out, and that&#8217;s where everything happened. It explains way more about why that year felt the way it did than I used to think. ~ pranav

### Zero Configuration Infrastructure (February 16, 2026)
**URL**: https://pranavkarra.me/blog/zero-configuration-infrastructure

The issue of agent generality isn’t intelligence. It’s plumbing.

I&#8217;ve been spending a lot of time lately researching and building infrastructure for AI agents at a startup, specifically around identity, credentials, and trust. And the more I dig into this space, the more I&#8217;m convinced there&#8217;s a gap that nobody&#8217;s really thinking about. We keep talking about making agents smarter. Bigger context windows, better reasoning, more capable tool use. And sure, that matters. But I think there&#8217;s a much more immediate bottleneck that gets almost no attention: the environment these agents operate in. (not just good computer use like manus) Here&#8217;s what I mean. I want to be able to tell my agent (clawdbot, or whatever I end up using six months from now) &#8220;hey, go find the cheapest flight to sf next weekend, book it, and get me a window seat.&#8221; (flights is like the most used example for general agent abilities idk why) And I want to hand it a few bucks and let it figure it out. Crucially, I don&#8217;t want to pre-configure which airline API it uses. I don&#8217;t want to set up accounts on five different travel services. I don&#8217;t want to paste API keys into env files. I just want it to go do the thing. Right now? That&#8217;s impossible. Not because the model can&#8217;t reason about it. It absolutely can. Claude probably thinks things through better than you can. But it&#8217;s because every single service out there requires a human to sign up, verify an email, add a credit card, generate an API key, and wire it all together before the agent can make its first request. The agent&#8217;s capability ceiling isn&#8217;t set by its intelligence. It&#8217;s set by how much configuration I&#8217;ve done ahead of time. The issue of agent generality isn&#8217;t intelligence. It&#8217;s plumbing. I keep coming back to a pretty simple idea: what if an agent could show up to a service it has never used before, prove who made it, and who its acting for, and pay on the spot? All without a human pre-configuring an API or an account or anything. The identity/token replaces the data a signup flow would provide. The payment protocol (eg; x402) replaces the billing relationship. Together they replace the API key. Let me try to sketch out what I think this actually looks like. Identity that is bound to the agent&#9; In my mind an agent&#8217;s identity has two sides, and I think this distinction really matters. There&#8217;s the developer identity, the &#8220;who built this thing&#8221; side. This should be cryptographically bound to the agent at credential time. It&#8217;s permanent. It tells you: this agent was built by this organization, they&#8217;ve been verified to a certain degree, here&#8217;s where they&#8217;re incorporated, here are the safety evaluations the agent has passed. This is the trust root. It&#8217;s what makes the agent traceable and the developer accountable. You can only approve agents made by anthropic for example, and reject all the ones from labs overseas. Then there&#8217;s the user identity, the &#8220;who is this agent acting for right now&#8221; side. This is more like a session. Right now the agent is acting for me, with my authorization, within the boundaries I&#8217;ve set. In parallel it might be acting for someone else. Same software, different principal. Standard KYC linked to the agent works great here. A verified email at the very least. The credential should encode the developer identity permanently and carry the user context as something that can rotate. A delegation, an attestation or a disclosure, whatever the right mechanism ends up being. The point is that a service receiving a request can answer both &#8220;should I trust this software?&#8221; and &#8220;who authorized this specific action?&#8221; from a single credential chain. Payments at the protocol layer The other half of this is payment. An agent can have a perfect credential proving who it is and what it&#8217;s capable of, but if it can&#8217;t pay for things, it still can&#8217;t do anything autonomously. This in my opinion is a major infrastructure problem. I think payment needs to happen at the protocol layer, accompanying individual requests rather than requiring a pre-existing billing relationship. Something like x402, where HTTP status code 402 (Payment Required) becomes a real part of the conversation between agent and service. The service says &#8220;this costs X,&#8221; the agent pays X, the service responds. Done. Completely replace api billing all together. For services that want more traditional arrangements (enterprise contracts, volume pricing), key management services could bridge the gap. The credential identity becomes the underlying account, and the KMS issues conventional API keys on top of it. But the point is that the credential + micropayment path should work by default, with traditional billing as an optional layer, not a requirement. What the credential should carry For this to work at scale with reliability and trust, the credential can&#8217;t just be a binary &#8220;verified&#8221; stamp. A service needs enough information to make its own trust decision. I think it needs to be rich. Something like: The developer&#8217;s verification tier (how thoroughly they&#8217;ve been checked), the agent&#8217;s safety evaluation scores (prompt injection robustness, PII leakage resistance, tool abuse handling), what data categories the agent processes, its technical profile, operational contacts. A healthcare API can look at this and say &#8220;I need level 3 verification and I need to confirm no PII retention.&#8221; A weather API can look at it and say &#8220;yeah, just needed the email anyway, whatever, you&#8217;re fine.&#8221; The credential provides the information. The policy is up to the service. Where I&#8217;m still a little blurry There are parts of this I haven&#8217;t fully thought through. Service discovery. How does the agent find services it&#8217;s never used? My intuition says this mostly looks like web search, service compilers and hubs. But I think there&#8217;s a more interesting version of this: what if services exposed a special kind of endpoint specifically for agents? Not documentation meant for humans, but.. Maybe a skill file. Something that tells the agent what the service does, how to use it well, what the expected inputs and outputs look like, what the pricing is, what credential tier is required. Think of it like a robots.txt but for agent capabilities. The agent hits this endpoint, reads the skill, and now it knows how to interact with a service it&#8217;s never encountered before. Discovery and onboarding in one step. I don&#8217;t think this needs a grand registry. The web is already pretty good at discovery, and agents are already pretty good at reading instructions. Credential ownership. I keep going back and forth on this, but I think I land on a self-sovereign model. You own your credential. The issuer can revoke it if you violate terms or something goes wrong, and that&#8217;s a necessary safety valve. But external validators and verifiers should be able to add signals to it too. Think of it less like a license that someone grants you and more like a passport that accumulates stamps. Different parties can attest to different things, but you hold it. The interaction model. I envision a world where AI becomes the abstraction layer between you and platforms. Not another UI. Not another app. The agent just... does things on your behalf. And the authorization should feel as natural as turning to someone next to you and saying &#8220;yeah, go ahead&#8221; or &#8220;yep, do this for me please.&#8221; We&#8217;re not there yet on the UX side, but the infrastructure needs to be ready for when we are. Clawdbot is just the beginning towards a more&#8230; app-less future. Adoption. The classic chicken-and-egg. Agents won&#8217;t carry credentials if no services accept them. Services won&#8217;t accept credentials if no agents carry them. But I actua...

### The Era of Personal Software (January 29, 2026)
**URL**: https://pranavkarra.me/blog/personal-software

Making my own software is becoming easier than finding the right app.

I am in a strange new period where making my own software is becoming easier than finding the right app.

I've been using Claude Code and Cursor to build tools for myself lately... tools I felt no other product out there fully had what I needed. Not polished products, just things that do exactly what I need. An expense tracker that works the way my brain works. A note-taking system that fits my actual workflow. The kind of software that would never exist because the market for "software that works exactly like Pranav wants" is precisely one person.

Here's what pushed me over the edge: I bought Pocket, a device for recording voice notes. Great hardware but it was basically a glorified voice recorder. But I felt the software was lacking. The notes didn't have what I wanted, the tasks were subpar, and it was overall just a "meh" experience. I had the device sitting idle in my backpack for a month. But the hardware was great. It wasn't my phone sitting on my desk actively recording; it's not obtrusive. I really wanted to use it properly.

Pocket Design

So I built my own web app around it. Now it transcribes my class lectures, lets me chat with the content to understand things better, generates flashcards, and keeps context about my courses, syllabi, professors, and upcoming tests. The device went from a neat gadget to something I actually rely on. I connected it to the Canvas API so it can pull my assignments and classes. I connected it to Todoist so it can update my tasks...stuff I would otherwise forget.

OpenPocket

The interesting thing is that even when my homemade version isn't as polished as the commercial alternative, I find myself appreciating it more. And when I occasionally go back to the "real" apps, I notice all the ways they don't quite fit. You can't unsee it.

The other thing: it's completely repairable. Something breaks or annoys you, you just fix it. You want a new feature, you add it. There's no roadmap to wait on, no feature request to submit into the void.

Yes, there are still some technical things you need to figure out. Deployment, keeping things running, that sort of thing. But honestly, these barriers are lower than ever. You can ask someone how to deploy something and have it live in an afternoon. The hard part used to be writing the code. That part is collapsing.

I think we're heading toward a world where spinning up personal software for a specific need becomes as natural as creating a spreadsheet. Not for everything, but for the things where the general solution just doesn't fit.

~ pranav

### Year of Intelligence (December 31, 2025)
**URL**: https://pranavkarra.me/blog/year-of-intelligence

This is a review of 2025, but also my predictions for 2026 and how everything will change. These are my opinions. I'll be wrong on some of them. But I think it's worth putting stakes in the ground.

This is a review of 2025, but also my predictions for 2026 and how everything will change. These are my opinions. I'll be wrong on some of them. But I think it's worth putting stakes in the ground.

Take a moment to appreciate how absurd this year has been.

It's been just over a year since OpenAI dropped o1. Its the model that introduced "reasoning" as a product category. Fourteen months ago, watching a model think before answering felt like science fiction. Today, GPT-5.2 Pro is research-grade intelligence. Claude Opus 4.5 is the best coding model on the planet—and it's not particularly close. We have models that can use computers, voice AI that's indistinguishable from humans, and enterprise spending on generative AI that went from $1.7 billion to $37 billion in 24 months.

2025 wasn't the year AI arrived - that was 2023. It wasn't the year AI got good - that was 2024. *2025 was the year AI got fast. Fast to improve. Fast to deploy. Fast to change the assumptions we'd built careers on.

Looking back, a few themes defined the year: the efficiency revolution (smaller models, better results), the global race heating up, voice and audio AI crossing the uncanny valley, agents learning to actually do things, and money—so much money—finally flowing from hype into production and leading the world to bubble allegations.

Let me walk you through it.

The State of Frontier Models:

timeline.png

A year ago, "reasoning models" were a novelty. OpenAI's o1 could think step-by-step, but it was slow, expensive, and felt more like a research preview than a product.

Fast forward to December 2025:

Claude Opus 4.5 from Anthropic is, in my view, the best model available for complex work—especially coding. It doesn't just autocomplete; it architects. It reads your codebase, understands intent, plans multi-file changes, and executes. I've talked to developers who describe working with it less like "using a tool" and more like "pair programming with someone who never gets tired." It's unbeatable right now.

GPT-5.2 Pro represents OpenAI's answer—research-grade intelligence optimized for deep analysis, long-context reasoning, and scientific work. It's iterative rather than revolutionary compared to earlier GPT-5 releases, but the ceiling keeps rising.

Gemini 3 (November) showed Google isn't out of the race. Their multimodal capabilities—especially for visual reasoning—remain best-in-class in some benchmarks.

And then there's Grok 4 and 4.1 from xAI, which iterated faster than anyone expected. Say what you will about Musk—his team shipped.

The pattern that emerged: release cycles compressed dramatically. What used to take 12-18 months now takes 3-6. The "next big model" stopped being an event and started being a regular occurrence.

My prediction for 2026: This pace continues, but the gains feel smaller. I think we're approaching diminishing returns on the current paradigm. Something architecturally new will emerge—not just "bigger transformer." The pressure for efficiency will force real innovation.

Tiny but mighty intelligence

For years, the playbook was simple: more parameters, more data, more GPUs, better models. 2025 broke that assumption.

The "scaling laws" that defined the GPT era might have started showing diminishing returns at the frontier. Meanwhile, smaller models kept getting dramatically better. Alibaba's Qwen3 family and Google's Gemma 3n proved that a 7B-parameter model in 2025 could outperform a 175B-parameter model from 2024.

This isn't academic. It means:

- AI running on your phone
- AI in your browser, no API calls
- AI on edge devices, in cars, in appliances
- Dramatically lower inference costs

Llama 4 (Meta, April) and Qwen3 (Alibaba, April) both delivered flagship performance with fully open weights. The moat around closed-source models narrowed considerably. If you're building an AI product today, you have real options beyond OpenAI and Anthropic.

My prediction for 2026: On-device AI becomes standard. Apple finally partners with a major lab—probably Anthropic or OpenAI—to ship a Siri that actually works. LLM-powered, on-device, private. The "your phone is smart" era finally arrives, years after it was promised.

image.png

The DeepSeek moment

January's biggest story wasn't a U.S. lab.

DeepSeek-R1 appeared on arXiv with a paper titled "Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." The model demonstrated frontier-class reasoning at dramatically lower cost, using techniques that emphasized efficiency over brute-force scale. It was open-weight. And it came from China.

The response was seismic. Within days, Nvidia's market cap shed hundreds of billions. The U.S. government floated banning DeepSeek entirely by March. The lab got hit with cyberattacks so severe they had to limit signups.

The comfortable assumption that American labs held an insurmountable lead while is not completely gone but definitely under threat.

By August, DeepSeek released V3.1. The gap hadn't just closed—it had inverted in some domains.

> The takeaway: Compute isn't everything. Algorithmic innovation can leapfrog hardware advantages.
> 

My prediction for 2026: China accelerates further. Alibaba, Baidu, ByteDance, and new entrants close remaining gaps. The "two-track" AI world becomes entrenched—separate ecosystems, separate supply chains, separate regulatory regimes. DeepSeek specifically becomes a household name.

image.png

Voice and Audio

ElevenLabs had a 2025 that other startups only dream about.

Eleven v3 launched as the most expressive text-to-speech model ever publicly available—with audio tags, multi-speaker dialogue, emotion control, and support for 70+ languages. For the first time, synthetic voice became genuinely difficult to distinguish from human recording in blind tests. Not "pretty good for AI." Indistinguishable.

They raised $180M at a $3.3B valuation and kept shipping:

- Scribe (February): Speech-to-text competitive with Whisper
- ElevenReader Publishing (December): Authors generating and selling AI audiobooks
- ElevenLabs Agents (November): Conversational AI platform, rebranded and expanded
- Iconic Voice Marketplace: Licensed voices from recognizable figures

But it wasn't just generation—transcription and dictation hit a new level too.

Tools like SuperWhisper and Wispr Flow made voice-to-text so accurate and fast that I know developers who've stopped typing entirely for first drafts. The friction of "speaking to your computer" disappeared. You talk, it transcribes, it's accurate. Done.

This matters more than people realize. Voice is the interface layer most humans actually prefer. We've been typing because we had to, not because we wanted to.

My prediction for 2026: Voice-first workflows become mainstream for knowledge workers. Not everyone—but a meaningful chunk of emails, documents, and code comments get dictated rather than typed. The tooling catches up to the capability.

AI Hardware: Waiting for the iPhone Moment

Here's an uncomfortable truth: we still don't know what AI hardware should look like.

2024 and 2025 saw a parade of attempts. Most of them flopped. The ones that didn't are still... limited.

The Humane AI Pin launched in April 2024 with breathless hype—a screenless, voice-first wearable that would replace your phone. It shipped to brutal reviews. The projector was unreadable in sunlight. The battery died in hours. The AI was slow. It felt like a $700 proof-of-concept that escaped from the lab too early.

Rabbit R1 had a similar arc. A cute orange device with a physical scroll wheel, promising an "AI-native" interface. People bought it, played with it for a week, and put it in a drawer. Turns out "talk to a box" isn't a compelling interaction model when your phone already does it better.

The pendant wave emerged next—Limitless (formerly Rewind), Tab, Friend, Plaud Note. The pitch: wear a microphone, record everything, let AI summarize your life. Some of these are genuinely useful for meeting ...

---

Last updated: June 1, 2026