确实太长了,用网易识别了文本
Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting edge work in computer architecture, the remarkable people behind it. We are your hosts. I'm souvin I Supermanian, and I'm Lisa Shoe.
Today we have with us Jim Keller, who is the CTO of TensTorrent and a veteran computer architect. Prior to TensTorrent he has held rules of senior vice president at Intel, vice president of autopilot at Tesla, vice president and chief architect at AMD and at PA semi, which was acquired by Apple. Jim has led several successful silicon designs over the decades, from the DEC Alpha processors to AMD K7, K8 K12 hyper transport and AMD zen family, the Apple A4 and A5 processors, and tesla's self driving car chip. Today, he is here to talk to us about the future of AI computing, and how to build and nurture hardware teams.
A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations they work for. Singer Jim welcomed to the forecast. We're so thrilled to have you here with us today.
Thanks. Good to be here.
Yeah, we're so thrilled. And long time listeners will know that our 1st question is, what's getting you up in the morning these days?
I was gonna say, I thought you're gonna say, what's keeping up at night? Well, I had a literal keeping up at night. I was just in India for a week. We opened up a design team there, and then I met the IT minister for computer that India is a initiative to promote risk five high performance servers and in the based design. So I've been talking to those guys about it, and literally was up very early in the morning. So that's, that's one thing. I think the uh like AI and modern tools and a few other things are causing change faster than anybody really thinks about, like the tools are changing the design points changing, the code's changing. And like, how do you build and design computers and software so you can go faster and use those tools, you know, top to bottom. Like we went from, you know, custom design to design using CAD tools, to, like, SOC designs, where you have multiple IP components and you put those together, and now the design complexity keeps going up, moore’s law into more transistors, but you still want to make progress at a good rate. And then how do you do all that together? And so, like, that's And then that opens you up applications, and then AI applications are really crazy, and I've been learning a lot about it. So you think, at some point, like things would slow down, but the opposites happening, like things are actually happening faster. Although I tend to wake up more in the middle than I thinking about things.
Actually, I'm, I'm at two or 03:00 a.m. myself. So one thing you said I thought was really interesting, which is about tools, and I know you've talked about this in some of your other, you know, other interviews and stuff. Um, because it seems like everything's changing faster than you can kind of a comedy for And then And then, in order to build new systems with all the changing technologies, you need the tools to change with it. But then it's kind of like A-A conundrum, because you need to change the tools on top of technology that's changing faster and faster. So when I was young, I thought being a computer architect was great, because the ground you stand on is always changing, which means that nothing stays segment and you can always kind of innovate and do new things. But now it's almost like the ground is changing and, um, and one day it's lava, and one day it's like, ice cold and what? And you have to change your shoes, and nobody's designed the shoes yet, so you. And so, huh, one day, wow, wow.
One day it's ice. It's pretty big. Yeah, it's a pretty good metaphor, actually. I like that. Oh, thanks.
How do you accommodate and deal with all this change, when, like, the tools that you would want to reason about the change and help you to make these designs um Particularly as we shift towards like these really, really specialized and AI algorithms themselves are changing very fast. Everything is changing very fast. So how do you, how do you cope with that?
Yeah, so well, 1st one requirement, this is hard on big companies with lots of legacy, is you need new designs like this is really like every I've said, every five years. You need the right stuff from scratch, period. And there's a big reason for that. Cause, no matter how good your old thing is, as you add little improvements and patches, it slowly becomes tangled together. A friend of mine sent me this paper titled A Big Ball of Mud. And it's a really old school website with a picture of a big ball of mud on the top. And it talks about, no matter how carefully architect hardware or software, you have nice, clean components, well defined interfaces, or APIs, over time, like this piece of software will learn about this piece, and this will do something because of that, you know, somebody will communicate in a meeting, and I'll figure out a clever way to make something faster, and pretty soon it's all tied together. So you need to have something new. Um, we're building a risc-V processor, you know, fairly high end one. We spend a lot of time up front architecting it so that each subsection of the computer is really modular and has really cleaned interfaces so that, you know, the mission I told him it would be unbelievable. Be great if we found 1995, maybe 99% of the bugs at component level, instead of at the CPU integration level. And I guess o CS went through that transition. Like if you buy high quality IP from, you know, IP vendors, and you make a chip, you don't really expect to find any bugs in the IP so and but if you're a company with lots of legacy IP and some of the IP was created by breaking up a more complicated thing and you never cleaned it up, you might find a large percentage of your bugs when you put pieces together, and you have to fix that. And so new design gives you an opportunity to say, I'm gonna go redesign this and make it clean at the modular level. And when I worked on Zen, some of their verification team came to me and basically told like, Jim, we really want to test all the units with really extensive test benches. In the old school way of thinking was, oh yeah, sure. Recreate the whole design twice. You know, you have a load store unit, and now you have to make a thing to test a load store unit, which looks a lot like the rest of the computer, right? And but they were right, because actually, the code to test the load story unit is actually simpler than the rest of the computer. And if you put ten units in a row together, you know, fetch instruction by cache, decode renamed schedule energerics. You load store. Your ability from a program, to manipulate the load store unit. It's tough, because you're looking through five layers of really complex state machines, right? Whereas if you want to make the load store unit do all its things right from the load store pins, you can do that. So sort of like, I was sort of getting there, but, um, the verification engineers made the case it would be ways for them to write more code at the test bench level, and how modulars that we test? And then as soon as you think that, then you say, well, why is the interface between these two units have a hundred randomly named signals? Like if you've done detailed computer architecture, you know there's A-A signal called Stage Four, a fetch valid, except when you know something happens, right? That's not a verifiable signal, like And computers are full of that stuff. And the timing reasons for no good reason, for lowerly had to fix a bug. Oh, now this unit needs to know that this is in a state right where else. Computers have well defined interfaces, you know, instruction PC, you know, memory fetch, fill, exception, kill stall, like so. So there's a really interesting challenge of, like, how do you build a computer with well defined interfaces? And so you're a question like, whenever something, I made this comment and to talk, I did whenever something hard is cause there's too much complicated, too much stuff in one place, like you have to break things down. And sometimes the problem isn't whether the tools are there or not, but that you've tried to solve something you know in too many ways, like you have an RTL problem, a timing problem, a physical design problem, and it gets to be too much, and you have to really figure how to break that down and make it simple enough to do. And, you know, verifiable design, verify interfaces. You know, architecture thinking a little bit differently is important. Oh, yeah. I thought a lot about that And the uh, and then you can see it, you know, like some projects you have was assessing. And it's partly because you really took the time to architect the pieces up front and make them bury independent and brely clean. And you had discipline not to slowly turn it into a ball mud, which was a natural human tendency, apparen'tly?
Yeah, so that's, that's super interesting. I want to follow up a little bit with this ball of med question, because two things come to mind for that. So early on in your answer, you said something about how it's hard for large companies with a lot of legacy to to do this, and yet we do have a lot of large companies who have stayed alive for a long time. So I sort of wonder, you know, when I was a young engine I was like, how does anything work? I didn't understand how anything could possibly work. And then secondarily, this whole thing about, you know, I've seen the kind of signals where you've got, like, a signal that's called one hot selector for this thing, in the front half of the cycle, and the other one hots like for the back half. You know, it's, it's just a total mess. And then we've got, we do get these students coming out of schools, and and maybe some of them have never write RTL in their life, they learn all their computer architecture from reading, you know, boxes and arrows and stuff. So how how do you then form a team? More than you do have the discipline to avoid this ball of mud, where it's like, okay, no, we're gonna name these things, right? There's gonna be a reason for this signal, that this signal's gonna be, you know, eight bits wide, and we're gonna enumerate every single one of those eight bits with the proper name and proper state. Like, how do you, how do you push that out from where you are?
Yeah, so there's a couple of things there. So one is, you know, like a hundred percent of the Fortune 100 companies from a hundred years ago were gone, like, like GE still around, but a completely different form. So so companies do go through life cycles, and almost all of them disappear over time, like some get propelled for monopoly reasons or infrastructure reasons or something. So success today does not guarantee success, although the time of that is longer than you think. Most companies don't fail in five years. They fail in 25 or 50. So so that's a thing, right? And then Steve Jobs famously said, so you you have some new product, and then you make it a lot better, and then you refine it, right? But they get to their next level, you have to make another new product. And the problem is the new product isn't as good as the refined old product, but you can't make the old product any better. That's the best rotary phone that will ever be. The pushbutton phone is MS. It doesn't feel as good. It doesn't look as good. And you have to have the courage to jump off the refined high spot to a lower spot that has head one like this is a it's a quote for one of his random talks. So I recommend people go like some as Steve jobs's key notes were great, like, I watched a bunch of them, and he was very clear about what it means to design a product and then to believe in where you're going. So you have to, and it's really hard for marketing, in sales people in the big company. Then you go, hey, we've got this great new product. It's 10% worse than what we have today, but over the next five years it's going to be twice as good. And they're all like, well, we'll wait five years. But, but then you're not working on the right thing, so you have to do that. So that's one.
The other is, I had this funny talk with, uh, a famous big company who was providing IP, and they had this interface, which was a lot of wires and a lot of random stuff. And I told him, like, the interface is too complicated. And I'd go, yeah, but it's really, it's a small number of gates. And I said, now you don't get it. The wires are expensive, the gates are free. Right? So one thing you do is, instead of saying, I got this, you know, lean and mean state machine, and I export all the wires as you take that and you put it into a box, and you turn it into an interface, and you trade off. You know, at the time, I thought we traded off gates for wires, add a bunch of Gates because they're cheap. Moors law Viz, lots of gates, and have less wires. But a better way to say it is we trade off, you know, technology for complexity. Like if you go look at A DDR memory controller, e.g. on an AXI bus, so typical IP you can buy from, you know, three or four or five people. Eggs like transactions are very simple. There's like 15 commands that you mostly use, read and write, go at the control, or you say, I want to read 32 bytes a day at this address. That's a really simple interface. Inside the controller, there's a memory show You might have 32 or 64 transactions. There's a pending ripe offer. There's a little state machine that knows the current state of the DDR channel. Maybe there's two DDR channels. There's two dims per channel. DDR DRAMs are really complicated widgets, because they got a read, cycle, a write cycle, a refresh time. There are in different states at the re transaction. It's really easy read address, wright address data. You don't know anything about the state of them. Now, if you build a high performance system, you say, the CPU, it's going to be optimized. And we're going to send read commands to the DRAMs, and we know that we're going to have to sequence a read command, so we're going to hit it with this read, this rash early to get the dram so you can export the complexity of that. And then to CPU knows exactly on cycle 167 that there's the 1st piece of data is gonna come out. We suppose the cpuss that would wrap the return read transactions, so you got requested word 1st like like we, we had all kinds of complexity, but nowadays, the transactions really simple. Read or wright at the memory controller period. The data comes out at a random time. It always comes out in the same order. You don't export the complexity of the CPU to the memory control Now, partly, it's for a good reason. The cpus have really big, caches and really good prefetchers. They're running at three, five Giga HZ. And the memory controller late and sees a hundred and 50 nanoseconds. Like wrapping that transaction out saves .2 ns out of a hundred. It's, it's a dumb complexity, right? Though. You look in your designing to go, well, how do I get the complexity of an interface to be so simple? Right? And then there's a funder's, there's a funny one, which is, um, people always want to say, well, why don't we have an industry standard for cache coherence? Well, cache coherence is a distributed stateship. right? So now you're saying, all the people that you know Qualcomm are going to have the same spec as ARM as you know, somebody else. And that's a hard thing to do, whereas specs like eggs, I you know, there's a bunch of specs that are pretty commonly used that are really simple, and when you make them simple enough, then then many people can use them. PCI Express is simple enough. Ethernet is mostly simple enough. Um. So so you see these things that are common standards work on them. Like the 1st version of Infiniband tried to optimize latency and made a 13 hundred pitch back. And nobody could build a device to the infinite aspect, so they went through some soul searching and erratically simplified it. Then focused on being a generation ahead on fires and having a simple but good enough our DNA, and some other things, and then, and that became a product that people could use somewhat successfully. How how do you avoid complexity? Well, at the top, you have to decide it's a real goal, and then you're going to spend something on it. I'm going to put an extra hundred thousand gates in each interface, so that the interface is simple. And a computer with 200000 gates, that's crazy. In the computer with a hundred million gates, genius, right? Like, like, there's, there's a different calculation.
Yeah, there was a fascinating set of insights, I think. Uh, starting from the top, you talked about, you know, the need for new designs and the need for us to sort of revamp things from scratch, like every five years. I guess some of the ingredients for how do you enable doing that? One part of it is, of course, modular, clean interfaces, but also the discipline of ensuring that you have these interfaces that are simple, not too complex at various layers of the stack. Maybe I can double click on your experiences of doing this in the AI world, because that's one of the places where is a lot of need for this. clearly, because compute demands are growing unabated. Right at the same time, there seems to be a desire or or a willingness to try out an experiment with new ideas. And one could argue that at certain layers of the stack, we have seen some amount of abstractions forming, e.g., you know, matrix operations or conv operations, more generally, have been the bread and butter for deep neural networks. In this coming era, do you see that philosophy and that perspective sort of tricking up and down the stack? Because there is the operators themselves, but on the software side, once again, there's a lot of complexity once you put them into the hardware side. As you said, you know, you're still designing interfaces with hundred wires for something that is semantically just to read and write.
Yeah, so, yeah. So this is a really good one. And I'd say in Ai, we haven't figured out what the hardware software contract is yet. And I'll give you an example. So into CPU world. And this is not quite true, but this is close to true. Software does arbitrarily complicated things, like, if you go look at a virtual machine, javascript engine, like, it's amazing, right? There's really complicated things. And then I grew, like, when I learned to program my program assembly, like I used to know all the out codes for 65 out two in the 8086, and most of them for VAX. And then I learned C programming. C programming is great because it's in assembly language. You know, it's a high level assembly language. And some level, as an architect, you write C code, and you can see what the instructions will be generated, and mostly see how it's going to execute. It's pretty simple, but but the actual contract for a modern computer is operations happen on registers, period. You put data in registers, and then you do add, subtracts and multiplies on them, and you can branch on them. And then, from a programmer's point of view, there's a memory model where you load things basically in order, like if you load Address A and then you load it again, you never get, like, a older value after you get a newer value. So it looks like you have ordered loads, and then you mostly have ordered stores, like And there's weak ordering models, but they mostly don't work because you have to put barriers in the fix it. So so you basically data lives in memory. You'll load it with relatively ordered loads, you do operations on registers, and then you store the data out with ordered stores, right? And then there's a page model, paging model, a privilege model, a security model and the but those are orthogonal to the execution model, right? And that. And so underneath that simple model, you can build an out order computer. And it took us 20 years to figure out how to do that. Rule number one is you don't violate the execution model that the suffer people see. So vla W failed because they tried to violate the model. Weak ordering kind of fails because it violates the model. Like anybody, like people who did radical recompile to get performance, you know, with a simple engine, failed like Out orderss execution is really wild, because the instructions issue wildly out of order, but they don't violate that model. And to achieve that, we build registry naming, or something called kill. So you execute a bunch of instructions that order, and some of that happens, and then you flush all the younger instructions from the kill point, and you finish the older ones in order. Right? We have massive branch predictors, data prefetchers, but data. But no matter what you do, you don't violate the contract on execution. And that means that the software programmers do not have to be micro architects. Right? As soon as you ask them to be my architects, you failed. Itanium had like eight barriers. Nobody knew what they were for. Like I was in I was a digital When we built Alpha, we had a memory barrier, and we added a because we had weak memory ordering. So we violated the execution contract, and we broke all the software. So we had a memory barrier, and but they didn't learn to put it in. They the operating system, had a double M-B macro because they didn't know where to put it in. So two of them, some places seemed to fix some random bugs, like, I'm not kidding. Now, we added a right in her barrier, which we thought would make things better, and it made it worse, because they just put the right memory barrier in the memory barrier macwell, because I didn't know what to do with it, right? So it was like a worst case scenario.
So now look at AI software. So AI software has been developed mostly by programmers, and programmers understand the execution model pretty well. Data lives in memory. You declare variables, which gives you a piece of memory. Or you do something like malloc and free, which is a, you know, some kind of memory allocator on top of a memory model. But generally speaking, when you're in a you're in a program, you don't talk about variables as addresses. You have they have needs, and generally to do operate, you know. So you say, A will be time C implicit in that is, the load of A-B and C, they go in the registers, you do operates on them and send them back. And GPU today are sort of executing that model like you have lots of very fast HPM Dram data all that's in memory for every matrix, multiplied the data in memory, you loaded and the registers you do with operations, you write it back out again. So you're constantly writing in and out data. That makes sense. Now attend to aren't we believe that when you write that program, and you can see it very clearly in all the descriptions of AI, that program actually defines. It defines a day of flow graph. So if you go Google Transformers or ResNet, you'll, you'll probably get a picture, right? And the picture will be a grass and right? And the graph says there's an operation box where data goes into it, something happens, and then something flows out of it. And then they generally, they call the input activations in the local data for that operational weight. And the number of operations they do is actually quite small. Matrix multiply convolution, some version of, you know, weloo g lu, you know, soft max. And then a variety of what they call tenser modifications, you know, where you be a shrink or blow to matrix, you pivot it, you transpose it, you convert tuding to three D there's a bunch of tensor modifications, but the number of operators in at spaces low. And then people are styleized on things like, you know, how do you exactly program relu like, there's some implementation methods. So so it's interesting that the programmers are writing code in Pietarch to a programming mall that looks like standard programming. They're describing what they're doing in terms of graphs, because that's, that's a nice way to think about it. But the code itself, you see, is a mix. So the challenge is like, how do we come up with a programming model that we all believe in and understand, that can go fast and not have to do things like read and write all the data, the memory, all the time, because some operator expects data to be a memory, and the only good way to do it. And then I've talked to, you know, AI programmers, were like, I'd happily reco that to make it twice as fast. Like, that's one view.
And the other of you is, I really don't care, because all the upside on this is either sides bigger weights, bigger weights, more parameters, and the hardware is going to make it faster. And in the short run, I'll just buy more processors. Though, there is a really interesting dynamic about us, and it sort of feels like, you know, when we 1st started building out border processors back in the I guess I started working on it at 95. The kid had been around for a while. The IBM 360 did out of our execution. The 36091. I think it was, it was amazing. But when I was a digital there was a, there was a debate about whether you could actually make an out or a computer work, right? And there was, you know, the competing ideas where, you know, there were super pipeline, super scale, VLIW out of order. And then little window, big window. There's a bunch of ideas about it. And what one like, clearly, I think. But you know, there's still some people who are debating this is, you know, out of order machines with big windows and really well architected reorder buffers, renaming and kill in our faces, like that's that works, right? And a really simple programmer that works. So the interesting thing is, so that and the GPUs, like some people tell me, while gpus just work, but in various thousands of people hand coding low level libraries. It was a really good academic paper. Had said, hey, I decided to write matrix multiply, and I wrote it, and could of the obvious way, and I got five per cent of the performance of nvidia's library. And then they did the obvious HPC transformations of transposed one of the matrix stub block for the denown register file size, and they got 30 %. And from 30 to 60 %, they start in a hack, they know how big the restrifile is, how many execution units are, how many threads, how many you know. And the individual library has, you know, they have what they call couton inches who are great programmers who know how to make that work. Now that the charm of cuda you write a cuda programm, it will work. The downside is that the performance may be randomly off by factors a lot. But when you're writing your code, if you say, well, why would your wright matrix multiply on the Gpu? There's a big library for that, like So they're a library. So they have a, they have a program all that works in libraries that mostly solve your needs. And then and now you're arbitrage in the last 10-20 percent, but that computer doesn't look anything like the way that actually, I programmers describe the program, right? And so that's a really interesting thing. So we're building a graph compiler, so our processors in array of processors, which have some low level hardware support to support data flow and some There are some interesting methods about how you take big operations and break them up to small operations and coordinate that. And the charm of it is it gives you much more efficient processing and less data movement, less reading the right of memory. There is interesting things think of. Like. So you say, if I have the ram big enough to hold all the data I ever need, it's a big ram. There's lots of power. So if you break the ram into a small spot, smaller thing, it's much more efficient for access. But then you might have to go to other ram. So there's a, there's a trade off between the ram spots. And a major multiply has this curious phenomena of for a, you know, an end by end matrix. It's in squared data movements for n cubed operations, right? What's the sort of wide what AI works like? Like AI As you make the operations bigger, you get more computation and data movement. And then there's ways to optimize that further by breaking the big operations into the right side, so they're big enough to have a good race of computed data movement, but small enough to be efficient on local memory access. But it's very much like you can see, all day, I start up, start taking different approaches at this. And it's not because people are, you know, trying to do something different. It's because there's, there's a real problem there, which is, you know, how the programs are written, how they're described, what they look at, are very different things. And, uh, like, it's technically interesting. And I think the solution will be much better than, oh, we'll just keep scaling faster memories forever. Like that. Doesn't seem like the right approach.
Yeah, I think that's a fascinating set of points. I do want to expand a little more on the AI hardware software contract or execution model that we know of in the hardware software. And typically so one of the attributes, at least of the state of the art models today, is like, they require a lot of skill that you have chips that interconnected together, you scale them or to really, really large systems. I wanted to get your perspective on
Well, actually, there are really small systems, so I think you have your metric wrong. So the human brain seems to be intelligent, and people estimated at 1e+18 to 1e+22 operations, depending on who you are. Right? So GPU is currently, you know, 1e+15 operations, a second. So it's off by something like six orders of magnitude. So we have a computer about this big, right? Which is an average, you know, intelligent operation computer, right? And then to build that today with GPU would take, you know, a hundred thousand GPS or something, which is like, the problem is in the GPUS side. So we say, that's big, but it's not that big. Well, big compared to what? Right? That's the funny part. Like, it used to be it took a really big computer to run a simple fortran program. And you could say, that is a big computer. But now that computer, if it's on a half a millimeter a silicon. So the fortran computer of the seventies is, you know, .1 mm?. So moore's Law fixed it. So size is a relative thing. So today, yes, to build a big training machine, you put together a thousand GPS, and it's, it feels really big in the parameter count. It's like, you know, 30 billion parameters. And there's, you know, a peta byte, the training data we want our uncle, and whose numbers are so big. Here's a funny number. So a transistor is a thousand by a thousand by a thousand atoms, 100 nm. Just think about like a seven nanometer transistor, called seven, but it's about 100 by 100 by 100 and a hundred nanometers, a thousand transistors. So that's a billion atoms. And we use out to resolve a one in a zero. Now we resolve a one in a zero at about a gigahertz, which is pretty cool. So it's a billion. One's in zeros per second out of a billion atoms. So it's a billion, big number, small number. I don't know. Machines look big, but, like, they think the computer and iphone would have been like, you know, ten cray-1 computers, which were big in their day, and now we think of them as a $20 part that, you know, fits in a three wat envelope. So it's a relative measure. And AI programs today are big compared to traditional computing. They're small compared to actual, you know, most average, intelligent people,
I hear you. I think, I think it's a fair point. The intent behind the question was more to say, um, you have things where you run things on a single chip, like, traditionally, you know, we had this, uh, when you were building a chip, we had a clear execution model and contract. And how these chips were hooked together was a separate problem in some sense. Like the distributed computing programming, if you went to databases, they had their own set of protocols, their own set of execution models for how database transactions would execute, and so on. If you went to the hitch PC world, they had a different set of execution models in contract like Fortel or Ai. In particular, do you still see that we can have this separation? Do you think that doesn't need for a more unified view across the chip level boundary to the system level boundary as well? Because, uh, you know, you have various forms of paradalism. Um, yeah.
Well, the fascinating thing about, like, a current, like a thousand ships, you know, ML GPU computer is 1st there's an accelerator model, there's a host CPU in an accelerator. Then in the GPU itself, there is a memory to memory operator model, right? And then the that node runs some kind of networking stack between multiple nodes, and then it's coordinated with something like MPI. So you have a memory model, an accelerated model, a networking model, an MPI model. And so to make it all work, you know, this is before you even run a program like it's, it's kind of amazing, right? And you can look back when, you know, when processors had FPU accelerators, the FPU had a driver, right? So you had to send operas to the FPU and poll it. But when after you got integrated together, the floating point just became a data type and instruction in a standard programming model. So the accelerator model occasionally disappears. So there was a, you know, as floating point got integrated, there were still vector processors, which were accelerators for vector programs. And they died, essentially because the floating point got fast enough that it was ways here to just have a couple more computers running floating point programs than it was to manage its accelerator driver models. So the current software structure is like, I would say, somewhat archaic and complicated, but it's built on well founded things like GPU accelerating graphics programs have been solid for years. There's lots Everybody looks at and goes, man, there's a lot of memory copies there. And, oh, the programming model, the GPU, too simple. But, you know, that's 20 year old model. And networking works, and MPI’s been used in a for a long time, and it's pretty well, you know, fleshed out. But the fact that, you know, to run in AI program, you need something like four programming models before you even write a pytorch program, it's, it's a kind of a amazing And even the pytorch doesn't really comprehend the higher level thing. So the they're running locally on nodes, the MPI coordinator thing. And yeah, it's fairly complicated. Now, if you had a really, really fast computer that run AI, you know that those layers would go away. And but we don't have that computer yet. And and that's where the, you know, the excitement happens at the, you know, what's the right way to think about this stuff? And it feels very much like we're, we're in a transitional place. And we've been through these before, like that change from in order computers to super scalers, super vector, you know, out of order, the VLIW word, all that took, like, 15 years, and we're probably in year three of this on AI. Yes.
So, so where do you feel, um, like it will land cause it? It seems like one of the tricky parts with AI questions is, you know, there's, like, you say, you know, from the programmer perspective, there's a data, full graph of what they're trying to do. You know, here's this tensor. Then you want you to send these things here, and then we have this hardware that we want a build to do it fast. And then, you know, the in video solution is they have this middle where where they translate that high level data photograph into some really low level libraries, so that they can make sure that it's fast on this particular piece of hardware. But the question that always seems to come up is like, how big should you know, we don't want to have a huge DRAM as you say, like that can handle all of the memory. That's like in one giant chunk. We don't necessarily want one single matrix multiplier that can handle the very largest matrix multiply you could ever imagine. You want that broken up. And so then how they should get sized, how they should then communicate with each other, and then how, in the end, they would get, you know, how, how condensed down, all sent And maybe some some small process that's actually doing, like the relu or something like that. Like, the question is always coming up about size, and in that sizing is often really dictated by the current state of the art, which is not going to be the state art in like, six or eight months. So
you ask a bunch of questions. So 1st AI is like, the capabilities are changing really fast. But the models, there's been a couple, you know, there was obviously alex Net, and then, you know, Resnet, which is a huge refinement and an uptick on that. And then the language models came out with transformers and intention, right? And then they, they, they had the, you know, the, what's the, the bitter lesson, like size always beats, you know, cleverness. So there's something interesting about there's a certain stability of that. They're obviously learning tweak. There's a bunch of tweaking going on, like, how do you tokenize the day to how do you map it into a space? How do you manage you're training? There's a whole bunch of things going on there. Um. But it's over the last couple of years that's that's been somewhat stable. The transformer paper came out, you know, how many years ago? Four years ago, right? And we're building way bigger models that are much refined on top of that, but that that stability. So there's a, there's a new benchmark every six months, and they're heading something called benchmark saturation. They say, you know, like, hey, we have this huge set of images. You know, how good does the AI recognize it. And it went from like, 20 %, you know, accurate, to 50, to 80, to 90, to 97. And all of a sudden that those benchmarks are saturateable, right at a hundred percent, you're done. Like it doesn't say whereas a lot of CPU benchmarks are, how many floating point operations the second can you do? And twice as Twice as many is always better. So so some of those things, like there's a bunch of, like, natural language tests and math tests, those are sat tradeable benchmarks, because you can get all the answers right. And so they so they've been in this kind of churn of these benchmarks are going to be good for five years. And they saturated one. So that's a, that's a funny thing. But let's talk about size. So at the high end, our sizes are large compared to our technology, but small comparative to need, I would say that's one thing. And then let's differentiate memory size memory capacity. So if memory capacity was big because it stored a lot of useful information, that would be really interesting. But if it's big because it needs a lot of place to store intermediate operations, that's kind of a drag, right? So so technology will so architecture models and technology will move to the point where you don't need memory to store intermediate operations, like modern server chips. The cache are big enough that the memory accesses should mostly be for 1st time you needed the data, not like there's a big working set and you're reading and writing the drams over and over to do a matrix multiply, that would be a drag, so so that for in that case, the caches should get bigger, and then the matrix multiply should be structured so that you can do blocks. And so that kind of behaviors well understood. So large memories for holding a trillion useful these bits of information. Seems like a fine use for large memory. Eight terabytes bandwidth, because you need to store intermediate operations. Seems kind of crazy. So there's a couple of differentiators you could make. And enters the observation that the brain doesn't look anything like a large memory that you're reading and writing. So, you know what a corical column is? It's, you know, ten or a hundred thousand neurons organized in a set of layers, and are verily, densely connected together there. And then they talk to each other at relatively low bandwidth. So that looks like an array of processors to me, with local storage and distributed computing and messaging. And it sort of looks like the graphs people say that they're building when they write AI, which is why architecturally, an architecture that embodys data flow and knows how to do graphs and knows how to pass intermediate results, instead of having to store them all the time, it seems like the natural thing was that clear like large memories to hold large numbers of things. Yea. Our current memories were small compared to the needs large memories for enemy to results. That seems like an architectural anomaly. We've been through this before. You know, HPC machines that used to have, you know, tall about memory ban. That's right, that's That used to be memory ban with memory ban with, you know, run streams, run streams, right? So then you got a hundred processors with a hundred megabytes of on chip cache. And we started to hear less about that, because more and more problems were factored into dense computation and sufficient on chip storage, and normally starts to be storage for large data sets. Now, it's not always true, and there's, there's a bunch of problems are very hard to factor that way. And there are some interesting things about very large, sparse data sets and unpredictable data sets. And though the HBC guys still have limits everywhere they look, but like it's not as clear a cut as it used to be. Show me the dram bandwidth and I'll tell you the performance of the computer like that's it's more complicated.
So does that mean, in some ways, it almost sounded like, in terms of, you know, the sizing of the structures inside of an ML computational engine, that that sounds to me like, you feel like that that's kind of stabilized, and that's relatively solved. But then we have all these AI startups that are trying to build, you know, hardware and the software stacks on top of them. And you mentioned before, they all have their own different ways of doing it. So there there is still, you know, if the structures themselves are largely stable now, because there are some now primitives.
Let me be clear about that. So my point was, it's moving slower than people think. So the results at the benchmark level, and some of the the tweaks and stuff, are moving quickly. The current set of structures have kind of gone through two or three generations, which are somewhat stable. But there there could be a new structure next year that changes everything. So I don't think it's, it's not reached a plateau stability, like, like out of order execution house, right? It's still, it's a interrupted equilibrium, right? Well,
I see, okay, you waited equilibrium.
Let's say. So, like, when Pete Ben and I, uh, we work together on the Tesla chip, we used to wake up every once in a while and say, what if the engine we just spent a year building doesn't work at all for the algorithm that to come up with tomorrow? Right? Like, that's a real that's to keep you up. That's a wake up at 04:00 in the morning. But it's, it turned out there's always been methods. They did come up with. Algorithms don't work on that engine, but they found way to transform the one algorithm to the execution engine, and they've had success with that. And they did get a huge power and lost savings by building A-A really focused engine as opposed to a general one. So, you know, that was a net win. So so we're in a state of punctuate equilibrium, relatively stable for a while, but you have the sense that things need to change. And then the description of the software, and other people write the code describe the software, and what that execution engine is. The fact that those are different is really, you know, curious, and, you know, invites like, let's say, innovation and thinking. And the prices aren't stable because people are pushing sizes right now, because most things would be better if they were ten times bigger, like some asmatotically so, but some, there's some some AI curves that are just still going, like you make up ten times bigger, and you're still getting better at it at a real rate. And that's that's why I think there there probably will be some really interesting breakthroughs in the next five years about how information is organized, and how the how to do a better job of, you know, representing, essentially meeting and relationships, which is what AI does, right?
Yeah. Just before we closer, this particular team on the topic of future breakthroughs and so on. As reflecting back on the progress of AI, we talked about a couple of things. One is how graphs seem to be our data. Photographs seem to be a very good abstraction, uh, to sort of express computations in build systems. On top of you mentioned a little bit about architectural anatomies that we should probably fix, like, you know, these in large intermediate memory operations and so on. But moving forward, as he looked forward to newer breakthroughs, AI. Are there any opinionated bets that, like you're making at Tenstorrent and that you think, uh, we should be looking at as a trend in the future?
Well, there's a couple things. One is, you know, some people observe this, but when it 1st came to me, it's like, so, you're taught at, you know, AI's, you know, inference and training, so inferences, you have a, you put an input into a train network, you'd get a result. And then train me something like, you have some data with an expected result, and you put it in, and then you get an air you back propagate the air, right? And when somebody explain how they train language models and some image models, you basically take a Satins and you put a blank in it, you can run it through and guess the blank, which, as I think, is really amazing, um. But to do that, you do the Ford calculation, you save everything. And then on the back propagation, you use optimization methods to look at what you calculated in what you should have calculated, and you update the weights. So brains clearly do not save all the information that they're doing on the forward path. And then there's some cool papers. There's one called Rafnet, which, you know, is like a reversible resnet, so you don't save the animator results, you calculate the backward pass, which is cool. So it seems like there's going to be breakthroughs on how we do training. And also, when humans think we don't train all the time, like uh Ilia at open Ai, I said, when you do something really fast, it only goes through six layers of neurons. You're not thinking it's trained. That's inference. Everything you do really fast is inference. And then the really interesting thing that we humans mostly do is more like generative stuff. They have some set of inputs. You go through your inference network that generates stuff, and then you look at what it produced in, what your inputs are, and then you make a decision to do it again. And you're not training, you're doing these cycling inference loops with you know, that part of your mind is sort of your current stage of understanding, which, you know, you could say is your input tokens, but but it's decorated with, like, what you're trying to do, what your goals are, what your your history is. And in every once in a while, you're thinking about something, you go, that's really good. That's good. And then you'll train that. So we have humans and multiple kinds of training. We have something exciting happens. You remember from your life, from one instance, right? So we have a method for training, like doing it a flash, remember exactly what happened. And then we have procedural training, where you do something repetitively, and you slowly train yourself to do that the automatized way. And then we have the thinking part, which is like generative learning, where you're stewing on it, you're trying as you try not, and you find a pattern that's superior to anything else you've thought of. And then we train that, because he used as the building block for the next. So humans are generative malls, and people are there's a lot of innovation, and they call prompt engineering, and there's all kinds of things, but the structure of it, it's almost like it's not philosophable enough yet to be thinking the humans think in terms of, we have overall goals. We have moral standards. We have stuff our paren'ts told us to do. We have short term goals, long term goals. We have constraints by our friends and society. That's also our present when we're doing our daily tasks of whatever we're trying to do, which is mostly not instantaneous inference, and it's mostly not training, though. I think that's a really interesting phenomenon. And then the fact that these big generative language models are starting to do that really, really curious. And then thinking about, like, how would you build a computer to do that better? Like, that's A-A, really interesting phenomena.
Yeah, no. Speaking of, you know, humans intelligence goals and building computers, maybe this is a good set way into the other team we wanted to talk to you about, which is, uh, you know, you've been at multiple companies. You've built lead and sort of nurtured successful teams deliver multiple projects. For her to get a perspective from you on how do you think about building teams, noting them, growing them and scaling them, especially from plants of building, you know, hardware systems or processors and so on? What have been your key learnings? How do you view this?
Yeah, problem. So, you know what the words creative tension mean? Right? Where you hold opposite ideas in your head, and then there's tension between them. You know, I want to get ahead, but I want to goof off this afternoon. This great attention, right? Like everybody does that, so I partly think so. I'm a computer architect. When I 1st started managing a big team when I went to AMD and I guess 2000, while were something like that, like I was working an Apple, I had one employee, and I wasn't managing them particularly well, and then I was going to manage 500 people. And I grew up 2000 or something, so I realized I could treat organizational design and people as an architectural problem. On the computer architect and people, generally speaking, have some function that they're good at, and then there's inputs and outputs. Everybody knows how that works. You write a box with a function input output, and one of your emissions as a computer architect is to organize functional units in a way that you get the results you want. So if you're trying to build a computer, you need to break down with the architecture of the organization that solves that problem. And modern processor designs, there's architecture Group, architecture Group, RTL, physical Design and Validation. And then people, you know, for, you know, probably evolutionary reasons, operate best in teams of tender, smaller victor's a manager, there's ten people, and a really good team of ten people will outperform ten individuals, like humans are designed to do that, a bad team will underperform him. And there's all these jokes about, as you add people, productivity goes down. But if your teams are well designed and your problems well architected, team people love to work. And, you know, teams, they like, you know, five to ten people working together, They're happy to see each other, up to like, 50 people, like, people all know each other pretty well. You know, at a hundred, it becomes difficult to know people. And you start, you start needing boundaries, because humans tend to give strangers as enemies. No matter what you say, you could be nice about us. So that's where, you know, a director will manage like a hundred people, so directors know each other, but the people and the directors teams don't need to know each other. So so there's a, there's an organizational dynamic that you need to figure out. And then and enters the people sides. So people love Engineers love what they do. Like, that's one of my givens, because engineering is way too boring and hard to do it every day with, you know, excitement if you didn't really love it. Like, people are willing to do hard, boring things if they like what they're doing. And people who don't love engineering leave it cause it's, it's, it's actually hard and annoying and repetitive. There's a bunch of stuff about it, like, I think about the same problem over and over and over and over. So people love, you know, so engineers generally like what they're doing. They have to, or they couldn't do it. And then but they, you know, there's this interesting dimension of they love to do things they own, but they all don't always know what the right thing to do is, right. So so you need to have some like hierarchy of goals and steps to do and processes and methods and wave people interact and motivate each other, because you're trying to get that creative tension spot between them. They own it and they're doing the right thing, but they're still following some kind of plan or organizing together, and that's that's difficult. Does that make sense? That of creative tension between organizational design, you know, ricks, requirements, and then let's say the human spirit, which is, you know, like people who are excited do ten times more work than people who think this place sucks, and, you know, I'd rather do anything else. So there's a huge swing. And then teams that are working well together, great stuff that individuals can't
two quite follow up questions on on that theme. Because I feel like, yeah, I'm sure, um, cause I think one thing, you know, as I have a transition from being a young engineer to A-A less young engineer, let's call it, you know, with the that 2nd piece of, like, constructing teams and having a clear sense of, like, what what you own, and how you solve your problems in having everybody kind of, you know, autonomous. But marching towards the same direction is like one of the hardest organizational problems, it seems. And so I think I saw once you said something like, people don't like to do what they're told to do. They like to do what they're inspired to do. But one of the things that I've witnessed across your multiple organizations and multiple groups is just that, just getting everybody to feel like, okay, this is what we're doing. We've all agreed on it. You own this, and you own this. I mean, that's like a one of the hardest parts. So that's one question that. And then the 2nd question was, you said something about groups of ten. How is your feeling about, like, remote work these days? I know they're very different questions, because of the thing that popped up.
Yeah, yeah. So I don't know what to think about remote work, cause I I'm not a fan. I like to work with people that said, I've had very successful projects with teams in different places talking to each other, but I've also seen people working remotely on slack talking all day long. They got their, their zoom chats up and running, they're talking, and it's almost like they're working next to each other. So it there's a lot to figure out about that one. Um, your 1st question. So it really helps to be an expert. So I've led some projects successfully, and I'm a computer architecture expert. I'm not an expert in everything, but I've written CAD tools, I've arched detective computers, I've written performance models. I've done transistor level design. Like I have a lot of capability. And then I'm also relatively fearless. Asked some questions. So if them in a room and people explain something like, young people, please listen to this. If you don't know, ask a question, people don't want to tell you the answer, Go work somewhere else, like go figure out what's going on. Somebody followed a complaint on me one time, because it was a senior VP asked too many technical questions, because they they were used to walking in the room with bullshit powerpoint and bullshitting for an hour about progress. And the page one, I was like, what the hell is what's going on here? Explain, you know, sentence one word one doesn't make any sense to me. Explain it. Nobody could explain it. And so you can imagine word two. It wasn't making things better, right? Somebody said, you run fastest when you're running towards something and away from something. And I am more than happy as a leader, to have a vision in and lay out what I want and work with people to get there. But I'm also more than happy to dig in everything and like, does it make sense? And can you do it? And you say, well, that doesn't scale, but apparen'tly it does. I worked at Apple, and Steve Jobs had everybody, like, on the balls of their feet, working on shape. Because I knew, if Steve found out you were screwing around to be held to pay elon, does it I-I watched him, you know, he motivated very large numbers of people to be very active, hands on, technically ready to answer questions about what they're doing. No bullshits and slides. So you need to have a good goal. You need to factor it into something. And people say, I get it, I believe it, and I could do that. You need to have confidence in the management structure. My team on Zen that the managers were very competent. They were all technically good. They were good managers. Because, you know, people do kind of divide into when you wake up in the morning, do you think about a technical problem or a people problem? Right? So I'm a technical person. I wake up thinking about technical problem, but then I want to solve problems that take lots of peoples. I've turned people into a problem. So I read a bunch of books on psychology and anthropology and an organizational structure and search of excellence, and, you know, you name it. And then I came up with a theory about how to do that. And one of my theories is I like to have managers work for me who are technically competent, but good people people. And, you know, that helps offen the edges around, like, say, me, E.G., or the problem, or the company. Like when an employee does work, they have technical problem in front of it, when they have their organizational problem, have their boss might be a problem, that's a drag. Like, the company might be a problem. Competition might be a problem. You know, it can be tough, right? So so people need somebody to look after them, take care of them. Inspire them. You know what? At the same time, you have to be doing something that's worth doing and balancing that out to So, I mean, like this is a huge space of creative tension. There are certain, you know, leaders that are really hard. I think they're too hard. I think life's too hard for a lot of people. I'd look for ways to solve organizational and technical challenges the way that most people fit. Kennell sent a digital said, there's no bad employees, there's only bad employee. Job matches. When I was young, I thought that was stupid, and somewhere around 45, I decided it was was a pretty good thought. If somebody is a good person, there's almost always a job that they can contribute. Now, if you're in a financial downturn, you have to lay people off. You lay off people in certain orders like people know that. But, you know, solving the problem for peoples is important, because I've seen it turn into a really positive results in and organization. But it's, it's, there's multiple dimensions. Like somebody said, well, what's the way to do it? Well, well, you know, is your goals clear? Well, a lot of people fail right there, and goals are clear. Like there's this organizational infrastructure called Bulls, organization Capability and Trust. You have to solve all four of them. Goals. Clear. They doable. Yeah, does the organization serve the goals? Like, if you, you know, the processors broken in the six units, do you have six good leads? And inside each unit is man capability? Do you have the technical ability to do it, or can you identify the problems, really, and now people who are possibly able to solve those problems? The capabilities is a big one. And then, and trust, disorderly, complicated sentence, because it's usually the output of the 1st three, not the inputs of some manager says, we're going to focus on trust and execution. Those are the outputs. In the world of input function, output, you can't change the output by looking at the output. You can change either the input or the function. The output is the output. The output changes when you change one of those to. So any manager says, what are execution, execution, execution, unless they're doing something about it? Are they doing something about it? Or are they training people? Are they hiring for talent? Are they reviewing people properly? Do they buy new CAD tools? Like, what did they do to make execution better? If all they do is say the word execution, then they're bullshitters. So you have to solve multiple dimensions, and you can't solve one of them. And there's a bunch of places underneath that where there's multiple dimensions. And then that's when you start the really see the difference between, you know, great leaders and get projects done. And I work with some really great leaders, so I'm just amazed, like, model three got built so fast, with so many people across so many dimensions. And, you know, elon was super inspirational and unbelievablely good at details, but Dugfield built and stopped a really wide ranging organization, and I watched and do it. I was there, you know, when we built an auto pilot and drove a car in 18 months. But compared to, you know, building Model Three and shipping it, that was relatively small potatoes. So it's a, it's a, it's really interesting to look at these things, and then you have to take them seriously. And then yet, I realize, no matter what, you don't know that much. And then, you know, then you have to dig into it. And then, if you're lucky, you find the right people and you get the right place. But yeah, it's, it's a hard problem.
An engineer probably should read way more books. And people always ask me, well, what three books should I read? And I think, well, I'll write a thousand. So the three books I like the best only probably like them because of the other stuff I knew. So I have a hard time recommending the book to win the office, but reading lot can help.
Yeah. That's one thing that's always mystified me about some of the higher level leaders that I've worked for when they talk about all the books that they read. Because in some ways, you know, when you're making, when you're a junior engineer, you're just like, okay, I'm just going, I'm going to do my job, right? I'm going to write my code, I'm going to do my module, I'm to run my unit test, or whatever. And then as you make the transition, it becomes much more like, okay, there is more than just technical things to getting this stuff done, and then making that transition to spending your brain power on the other part, and then making sure that you're spending it appropriately. It's a little bit of a tough transition, because there's a lot of comfort in your boxes and arrows, right when it should be like this. But then it was like, well, how do you get everybody to believe that it should be like this? And how do we get anyone here…
you are between 35 and 45, you realize almost all your problems aren't technical.
Yes, you know well.
And then, unfortunately, it's a little like when you train a language model. You don't say, what's the $0.10 is? I need to train this model? No more. Actually helps, right? And, and a lot of times, a really great book is only a great book because of the other hundred books you read, because it's the one that brought you the ideas together. And if you read at 1st, it wouldn't mean anything. So, so quantity kind accounts, I'm not afraid. Like, I frequently read a book, and I really, but so most peoples write books have 25 or 50 pages and stuff to tell you, but the editor tells you to write to 200 pages, because that's what sells. So so don't be afraid to read 50 pages and go, I got it. He seems to be repeating himself, and almost all writers, once they start to repeat himself, they don't bury some really good nugget under pages later, right? Once they start repeating themselves, they're disreputing themselves. Because peck people passion up. Write a book about engineering, management, idea, inspiration projects. They pour their heart out until they're done, and then they fill it out until they get the 200 pages. So don't be afraid to throw a book out after 50 pages. Now, I read this book, uh, against method. It's my current favorite book by Paul fire Abbot. And the goddamm book was like 300 pages long, and he just had one idea. Afternoon, I kept waiting for him to start repeating himself so I could put it down, because it was too dense. But here he didn't quit. He just kept writing all the way through the damn book. And it was pretty funny. But yeah, it's a real thing. So if you start to realize that there's more to work than just engineering technical problems, you've reached the next level, which is good, and you should saw it, cause it'll be, it'll make you happier, and you'll be more successful if you do. And you may conclude that's really cool. And now I'm better at managing the team, and I can focus on technical stuff. And now we want to manage bigger teams. Or now we want to go into sales, or or now I get it. I really should spend more time surfing. Like it's all fine. But when you reach that point, it's a good thing. It's a tough thing. It's harder than college. Harder than college, yeah, college, there's answers in the book for the most part. And then, you know, when you start to work, you realize that they, you know, if the answer is in the book, you don't get paid that much. But then we start to try to solve this next level problem. There is no answers at all, but there are some solutions.
So maybe this is a good time to bind the clocks back, and you can tell our audience, like, how you got into computer architecture? How did you achieve the, you know, employee to job match fit in your How did you get interested in the field? And how did you eventually get to, you know, beggar?
And I would take control further random, because in college, like I basically goof brown in high school, like I see kids today studying for SATs. And I still remember, you know, being out with my buddies, thinking, I should have to take the SATs tomorrow. I should have probably not stayed out all night and then, um, but I got to college, and I done well enough in high school, and I'd like math and physics and a few topics. And I, and I went to Penn State, and I was a combination electrical engineering philosophy major. And, but it turns out I can't write at all. And in my sophomore year, the head of philosophy department sent me a note, like, I think through the electrical engineering department, and he said, I really wanted to meet you. And it was like, yeah, it's great to meet you too. It's really wild and expect you know, I just taken four philosophy classes. And he goes, yeah, we we notice that you're a philosophy major. And then he pulled out, like, a paper written by a typical philosophy student for like a midterm, ten pages, you know, nicely written perfectly, you know, perfect sentences and everything. And then he had my page, which was like a half a paragraph with scratches out and words in the market. And he said, he said, jim, you're never going to get, uh, philosophy degree Penn State like you're He said, we're happy, we like you in class, but we write a lot and you can't write at all. And I was like, oh my God, really? You're kicking me out of philosophy. I didn't even know that was a thing. So, but, but Penn State was great. We had a two inch wait for fab. I made wafers. So in college, I thought I was a semiconductor, you know, major. My adviser ran that lab, and I learned a lot about that. And I took a random job to build fiber optic networking controllers, because I want to live in Florida near the beach. And while I was there, it was a terrible job at a great experience. Somebody said I should work a digital equipment. And they gave me the backs eleven 77 80 paper manuals, which I read on the plane to my job interview. And I thought that was really cool. But I went in there and I had a lot of questions. So I met the chief architect at 1170, in the Eleven seven 80, bob Stewart, it was a great engineer, and I had all these questions, and he was, he thought I was funny, and he hired me as a as a lark, I think, cause he knew I didn't know anything about computers. I literally told him I just read the book on the plane. I'd taken one Fortran class in my life, and it didn't go well. I spent 15 years of digital and that's where I learned to be a computer architect, mostly working for Bob Stewart at to start. But there was some other guys, dug Clark, dave Sager, like there was a couple legendary people there. They were really good. And I had the opportunity to be to work with them. And I was fairly energetic as a kid, so I just jumped in the stuff, and I learned a lot. I slowly learned computer architecture, like in a Pete Pete band and I work together, starting in 1983 or two, or something like that. So we worked on a back 80 800 together a couple following projects. The 1st, the 2nd alpha ship, gave me five and wrote the performance model for that, you know. And back then, you read papers, you know, sometimes. And then he did hands on work, you know, it was really good. And I, and I got a chance to go into lots of different things. I wrote a logic simulator, a time in verifier, several performance models. I drew schematics. I wrote a little bit RTL. Weirdly, not that much artill in my life, but some cause, that's, that's the main method now. But when I did it, like I used to have new Carno maps and old Colin took a schoolball stuff, and nobody does any more. Yes, the digital equipment. I was there about 15 years, worked on some very successful and some unsuccessful products, which is, you know, our 2nd, or follow on to the Backstadian Hundred, was canceled for partly design reasons and partly political reasons, and it was super painful to realize. And then digital itself went out of business right when I left, and it got sold to Compact. So, you know, like, you get five ex deckeys together with a beer while I'll start crying in about 30 min because it was a great place to work. And, you know, a surprising, you know, disaster, let's say we were building the world fastest computers going out of business at the same time. So the combination of, you know, products, market management, let's say, business plan like digital didn't transition the business plan. And by that had been captured by the marketing people to the extent that they thought raising prices on VAX was the wind as pcs and board stations came out.
So that's one thing I try to sometimes tell young engineers or interns that come in, you know, that a lot of success in any business does not have to come down to the nuts and bolts of the technical stuff. And in fact, often doesn't. And so at the same time, though, it's really important for the people who are coming up, as you said, to know a lot of these nuts and bolts. Right? So it sounds like in those 1st 15 years, that's where you collect a lot of experience and knowledge about the whole process, end to end, a how to build a good computer. So we want that. But then, at the same time, you know, this understanding that, you know, you might be, you know, betas might be better than, um, beta Max might be better, but they didn't win, you know, decks. I think every ex deck person I know, like you said, like, that was wonderful experience. But then, and really sad by how it ended, how have you, maybe, it sounds like maybe you've, at this point your career, been able to sort of bridge where you can have a good engineering service, good build, build good teams, and then somehow also translate that into successful
Yeah, it's got to kind of complicated. So 1st, like, like my 1st job, I work with some really smart people out of college, but everybody hated the company. And when, then when I went to digital, everybody loved digital. Like a friend of mine's partner said, what are they put in the water? All you guys will reduce like we'd work at work, and then we go out drinking and talk about work, and then go work. On Saturday when you're young, like there's two really important things. One is, are you working for somebody you can learn from? And make sure you learn, ask questions, try projects, get good feedback. And also work in an environment where it's interesting. Now, you could be in a really good group in a failing company and get a great experience. But generally speaking, companies are happier if a year in a growth market, and less happy in a shrinking market. You can kind of pick and choose. But right now, there's some very big, growing companies which have sort of lost the plot on engineers. And engineers to get lost. You know, they hire a hundred interns and they put them on random stuff, and they don't learn anything. So you got to be careful about that. But you know, a positive environment with somebody you can really learn from is really important. But then when you get the bigger stuff, it's like, how do you build a successful product? Like I went day indeed, knowing that they were a failing company, right? And I thought part of the fun would be, now, could we turn it around? And I work for a Roy Reid, who was, you know, he was very clear, like he said he didn't realize their going bankrupt until he got there. He looked at the books, and then he said, I'll save the company. You guys built the products. The Raj Gadore and I were the architects of turning around. He be using graphics. But we had some really good people, and we had organizational problems, and some, you know, a relatively small number of bad managers and bad people, you know, job fits. And, you know, we made, we made a bunch of adjustments somewhere, pretty visible, and some were subtler. But I was really working hard on, like, how do you get the negol's organization capabilities all lined up? And I believe that, you know, trust would come out of that. And and I had a really great consultant working with me then, cat Row, who who gave me a like a nonstop stream of books and articles and stuff to read and brainstorm with me a lot about how to be successful. So I invested in, you know, becoming like a manager, I could do something useful. And I-I went to a company where I knew some people, and I knew they had really good people, and I knew that we could build a good product. But the challenge was all the operational organizational stuff in the way, and some some serious technical problems too. But so yeah, it's always A-A, relatively big investment, whereas Tesla was like a real Tesla and Apple were like weirdly successful companies. And I went there thinking, I don't know how they were, and my job wasn't to change them. I just wanted to learn from it. When I went to Apple, like, we went through three locked doors. And like like some people say, no, the most important thing is sharing and doing that, and apples full of silos, and the most important is carrying management leadership. And Steve Jobs is a famously difficult person, and is still really successful in people really love working there. And, you know, even though it got all the obvious things wrong, you know, like people were hard on each other. He went through locked door to your project. It was silo, and Steve famously yelled at people when they didn't do exactly what he wanted, while simultaneously expecting them to be creative and do something new. You know, tesla is very chaotic, but they're producing cars, you know, like, how does that stop work? And then it turns out there's lots of reasons why it works. And this is where when people are inspired despite or because of hard to say, you know, the situation, like, wild things happened. So I learned a tremendous And then I went to work at Intel, because I also, well, actually, I joined there partly because I had some ideas about how the bill really high on servers was, you know, into some really great technology. I spent most of my time working on methodology, is teen dynamics and some basics. I met a lot of people there. I had a lot of fun. Like, I had too much fun working. Yeah, it was an interesting set of challenges and, you know, stretch me out. But then my next thing I wanted to, you know, AI is, you know, boiling the ocean on computing in a really neat way, and working with an architecture I believe in in terms of like, this looks like the right map of what programmers are writing and what they say they're doing in the hardware. Well, it doesn't mean that that description gives you the right hardware software contract, but there's lots of technical work to do there. It's evolving. And then AI is attracted to really smart people. I meet really smart people all the time. I met a guy recently doing AI for for game engines. Soon. We talked for 4 h and it felt like 5 min It was really interesting. And so I like that kind of stimulating thing. And then, you know, part of me thinks, well, how's that going to impact how we do design, how I work with teams, how we work with people, what's going to happen in life. And, you know, it's okay. I feel lucky to be able to meet with people like that and talk to them. But, you know, I've done a lot of work to get there. You know, I work hard. I read a lot of books. I work on projects, sweated through difficult times with both technical problems and people problems. So that engineers do the work because they love it, not because it's easy or particularly, you know, short term rewarding. Engineering's not a good place for short term rewards, but it's relative to be satisfied. You know, the difference between happiness and satisfaction like this is a funny thing, because the people, there's already studies, they always publish like people who have kids are less happy than people who don't. And a hundred percent of paren'ts, well, not a hundred percent, but, you know, a hundred percent of good paren'ts say the best thing they ever did in their life was raise children. And like my dad told me that, I was like, really? You worked all the time. The happiness is what happens today, is satisfaction. As the or this successful project over time, engineering is much more of a satisfaction thing than a happiness. The humans have two reward systems, a slow one and a fast. 01:00 a.m. I hungry. I need food today. Yeah, I got it. I'm happy. Did I survive the air? Did my children survive childhood? You know, did those are satisfaction dimensions. Engineering is way more oriented to that, although it is fun to get to your model to compile or a test, or, you know, you solve a technical problem, or file a pad and, like, there's a bunch of short term happiness, but mostly it's a long term word.
So maybe, on that note, um, you can provide some words of wisdom to our listeners, interested in computer actor, interest in AI, interested in building their careers, perhaps in likeness after yours.
Well, like I said, work some So for people coming out of college or interns and stuff, try to find a place where people are doing real work, real hands on work, and they're relatively excited about it. Like, if you're, you know, when you're young, you should be working a lot hours, you know, you can't get, you can't get where you want it. 40 h a week, 5060. You know, some people say they bar Katy, but mostly they're a screw room for at that time. But something where you really feel like it's easy to work hard, it's easy to put in time do real hands on work and make sure you you have the least a few people that you really respect, that you know, they seem to know a lot. They teach you stuff. They take the time. And and then worked on a couple different things. I know people, I worked on, like one group for ten years, and I really loved the group. But, you know, working in multiple projects and different groups is really useful over time. It doesn't mean you'll leave in the middle of a project, but, you know, you periodically, you you find something new that's challenging, that takes you back. Like some people are really worried, like, I'm at this level here, but I go that project, I'll be here. And the answer that is great, do it go somewhere where you have to start over. You know, at Tesla. At one point I was, I was walking along shelves looking for visors, you know, sunvisors for a Model X like, it's a ridiculous job for me. But then I it made me think a lot about, well, how is the all the parts in that the factory organized, and then how the parts flow into the factory, and what does it look like, and, you know, and why is it built this way? And I learned a boatload about how cars go together. Who knew? And then that turned out to be really useful for thinking about how computers, and some of the computer skulls I had are actually useful for building cars. Yeah, it was really stimulating, made me think about things way differently than before, and surprising and, you know, kind of unusual. So, yeah, don't don't avoid those upper
jim keller. Thank you so much for joining us today. It's in a real pleasure talking to you. We've learned so much, and I'm sure our listeners will enjoy a lot too. Yep. It was a truly insightful conversation. And thank you so much for being on the podcast. And to our listeners. Uh, thank you for listening to the Computer Architecture portcast Till next time, it's good Bye from us. Singer.
【 在 MegaStone 的大作中提到: 】
:
https://comparchpodcast.podbean.com/e/episode-11-future-of-ai-computing-and-how-to-build-nurture-hardware-teams%c2%a0with-jim-keller-tenstorrent/: 感觉硅仙人去tenstorrent后需要做PR了。除了之前板上之前转过的AnandTech的采访以外,发现他又去Computer Architecture Podcast来了一期。(前两年还去过Lex Fridman's Podcast)。
: 强烈推荐。硅仙人亲自讲模块化设计、编程模型设计,比二手的转写清晰多了。而且采访他的两位,Suvinay Subramanian和Lisa Hsu也都是体系结构科班出身,问得问题比Lex Fridman质量高很多。
: ...................
--
FROM 111.201.49.*