Open in app
Cover art for How one AI model reads a million words at once

How one AI model reads a million words at once

Technology · 6 min listen

Get the app on mobile
Download on the App Store Get it on Google Play
Cover art for How one AI model reads a million words at once
0:00
0:00
Transcript

HostI was looking through some old medical records the other day and felt like my brain was going to melt after just ten pages. It makes me wonder how these new AI tools can take in a whole library of books in one go and not skip a beat. How do they actually hold a million words in their head at the same time?

GuestIt helps to think of this like a giant desk. In the past, an AI only had a desk the size of a sticky note. If you gave it a long story, it would've to throw away the beginning of the story just to make room for the end. We call this space the context window. It's basically the working memory of the model. When people say a model can handle a million words, they mean the desk has grown from a sticky note to the size of a football field. It can spread out every single page of a dozen thick books all at once and see how a name on page one connects to a secret on page five hundred. But they don't do this by just having more memory. They use a trick called attention. It's a bit like how you can be in a loud, crowded room but still hear your friend whisper your name. The model learns to ignore the white noise and focus only on the words that matter for the question you asked.

HostBut that sounds like it's still doing a lot of heavy lifting. If it's looking at a million words, it has to be skimming. There's no way it's checking every single connection between every single word because the math would just explode.

GuestYou're right that the math is the biggest hurdle. Usually, if you double the number of words, the work the computer has to do goes up by four times. If you have a million words, that math gets out of hand very fast. To get around this, the newest models use a system that's a bit like a team of specialists. Instead of the whole brain working on every single word, the model only wakes up the specific parts it needs for the task at hand. If you ask a question about a law in a massive pile of legal papers, the model only turns on the parts of itself that are good at law and logic. This keeps the computer from over heating and lets it scan that huge field of data much faster. It's not really skimming in the way a human does, where we might miss a detail. It's more like it has a very fast filing system that knows exactly which folders to pull.

HostI still struggle with the idea that it really knows what's in there. If I put a single wrong phone number in the middle of a thousand novels, can it really find that one needle in the haystack? Or is it just guessing based on what it thinks a phone number should look like?

GuestThat's actually the exact test scientists use to see if these models are faking it. They call it the needle in a haystack test. They hide a tiny, random fact in a sea of unrelated text and ask the AI to find it. The best models today are hitting almost a hundred percent accuracy. They can find that one specific line even when it's buried under four thousand pages of fluff. This is a huge shift because it means the AI isn't just summarizing the vibe of a document. It's actually keeping track of the small details. It does this by creating a sort of map. Every word is turned into a set of numbers that shows its relationship to every other word. When you ask a question, the model looks at the map and follows the paths that lead to the right answer. It's a bit like having a map of a city where you can see every single alleyway at once instead of just the main streets.

HostThat sounds useful, but it must be incredibly slow. If I have to wait ten minutes for it to read my files, I might as well just search for keywords myself.

GuestThat's where the friction usually happens. Big memory usually means slow speed. But researchers found a way to squeeze the data down. They use a method that lets the model process the big pile of text once and then keep a sort of shorthand version of it ready to go. So, the first time you upload your million words, it might take a minute to get its bearings. But once it has built that internal map, you can ask follow up questions almost instantly. It's like the model has finished reading the book and is now just sitting there with the book open, waiting for you to ask where the main character went. It doesn't have to re read the whole thing every time you speak.

HostSo we're moving to a world where we don't even need to organize our files anymore because we can just throw the whole messy pile at the AI. But doesn't that make the AI more likely to get confused? If it has too much to look at, won't it start seeing patterns that aren't actually there?

GuestActually, it's often the opposite. When an AI has a tiny memory, it has to guess what happened in the parts it forgot, which leads to those famous mistakes where it just makes things up. Giving it a million words of context is like giving a witness a video of the crime instead of just asking them to remember it. It has the raw data right in front of its eyes, so it doesn't have to guess as much. The real challenge now isn't the memory itself, but the cost of the electricity and the chips needed to keep that giant desk open. We're basically building a digital brain that can hold more information than any human could ever read in a lifetime, but we're still figuring out how to do that without using the power of a small city.

HostThe most amazing part is that it can find that one needle in the haystack without ever getting tired or bored of the hunt.

GuestThat stack of a thousand books doesn't seem so scary when you realize the desk is finally big enough to hold every single page.

Made with Wander

A world of curiosity you can listen to. Explore endless questions, or ask your own.

Get the app