Jim Keller去Computer Architecture Podcast做了一期专访，强推

水木社区手机版

主题:Jim Keller去Computer Architecture Podcast做了一期专访，强推
楼主|MegaStone|2023-02-24 02:23:28|只看此ID
https://comparchpodcast.podbean.com/e/episode-11-future-of-ai-computing-and-how-to-build-nurture-hardware-teams%c2%a0with-jim-keller-tenstorrent/

感觉硅仙人去tenstorrent后需要做PR了。除了之前板上之前转过的AnandTech的采访以外，发现他又去Computer Architecture Podcast来了一期。（前两年还去过Lex Fridman's Podcast）。

强烈推荐。硅仙人亲自讲模块化设计、编程模型设计，比二手的转写清晰多了。而且采访他的两位，Suvinay Subramanian和Lisa Hsu也都是体系结构科班出身，问得问题比Lex Fridman质量高很多。

顺带提一下，包老师也上过这个Computer Architecture Podcast
--
FROM 101.230.196.*
1楼|philbloo|2023-02-25 18:31:19|只看此ID
懒得听有没有太长不读版
--
FROM 37.219.71.*
2楼|tianbing1212|2023-02-25 23:18:11|只看此ID
确实太长了，用网易识别了文本

Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting edge work in computer architecture, the remarkable people behind it. We are your hosts. I'm souvin I Supermanian, and I'm Lisa Shoe.
Today we have with us Jim Keller, who is the CTO of TensTorrent and a veteran computer architect. Prior to TensTorrent he has held rules of senior vice president at Intel, vice president of autopilot at Tesla, vice president and chief architect at AMD and at PA semi, which was acquired by Apple. Jim has led several successful silicon designs over the decades, from the DEC Alpha processors to AMD K7, K8 K12 hyper transport and AMD zen family, the Apple A4 and A5 processors, and tesla's self driving car chip. Today, he is here to talk to us about the future of AI computing, and how to build and nurture hardware teams.
A quick disclaimer that all views shared on the show are the opinions of individuals and do not reflect the views of the organizations they work for. Singer Jim welcomed to the forecast. We're so thrilled to have you here with us today.
Thanks. Good to be here.
Yeah, we're so thrilled. And long time listeners will know that our 1st question is, what's getting you up in the morning these days?
I was gonna say, I thought you're gonna say, what's keeping up at night? Well, I had a literal keeping up at night. I was just in India for a week. We opened up a design team there, and then I met the IT minister for computer that India is a initiative to promote risk five high performance servers and in the based design. So I've been talking to those guys about it, and literally was up very early in the morning. So that's, that's one thing. I think the uh like AI and modern tools and a few other things are causing change faster than anybody really thinks about, like the tools are changing the design points changing, the code's changing. And like, how do you build and design computers and software so you can go faster and use those tools, you know, top to bottom. Like we went from, you know, custom design to design using CAD tools, to, like, SOC designs, where you have multiple IP components and you put those together, and now the design complexity keeps going up, moore’s law into more transistors, but you still want to make progress at a good rate. And then how do you do all that together? And so, like, that's And then that opens you up applications, and then AI applications are really crazy, and I've been learning a lot about it. So you think, at some point, like things would slow down, but the opposites happening, like things are actually happening faster. Although I tend to wake up more in the middle than I thinking about things.
Actually, I'm, I'm at two or 03:00 a.m. myself. So one thing you said I thought was really interesting, which is about tools, and I know you've talked about this in some of your other, you know, other interviews and stuff. Um, because it seems like everything's changing faster than you can kind of a comedy for And then And then, in order to build new systems with all the changing technologies, you need the tools to change with it. But then it's kind of like A-A conundrum, because you need to change the tools on top of technology that's changing faster and faster. So when I was young, I thought being a computer architect was great, because the ground you stand on is always changing, which means that nothing stays segment and you can always kind of innovate and do new things. But now it's almost like the ground is changing and, um, and one day it's lava, and one day it's like, ice cold and what? And you have to change your shoes, and nobody's designed the shoes yet, so you. And so, huh, one day, wow, wow.
One day it's ice. It's pretty big. Yeah, it's a pretty good metaphor, actually. I like that. Oh, thanks.
How do you accommodate and deal with all this change, when, like, the tools that you would want to reason about the change and help you to make these designs um Particularly as we shift towards like these really, really specialized and AI algorithms themselves are changing very fast. Everything is changing very fast. So how do you, how do you cope with that?
Yeah, so well, 1st one requirement, this is hard on big companies with lots of legacy, is you need new designs like this is really like every I've said, every five years. You need the right stuff from scratch, period. And there's a big reason for that. Cause, no matter how good your old thing is, as you add little improvements and patches, it slowly becomes tangled together. A friend of mine sent me this paper titled A Big Ball of Mud. And it's a really old school website with a picture of a big ball of mud on the top. And it talks about, no matter how carefully architect hardware or software, you have nice, clean components, well defined interfaces, or APIs, over time, like this piece of software will learn about this piece, and this will do something because of that, you know, somebody will communicate in a meeting, and I'll figure out a clever way to make something faster, and pretty soon it's all tied together. So you need to have something new. Um, we're building a risc-V processor, you know, fairly high end one. We spend a lot of time up front architecting it so that each subsection of the computer is really modular and has really cleaned interfaces so that, you know, the mission I told him it would be unbelievable. Be great if we found 1995, maybe 99% of the bugs at component level, instead of at the CPU integration level. And I guess o CS went through that transition. Like if you buy high quality IP from, you know, IP vendors, and you make a chip, you don't really expect to find any bugs in the IP so and but if you're a company with lots of legacy IP and some of the IP was created by breaking up a more complicated thing and you never cleaned it up, you might find a large percentage of your bugs when you put pieces together, and you have to fix that. And so new design gives you an opportunity to say, I'm gonna go redesign this and make it clean at the modular level. And when I worked on Zen, some of their verification team came to me and basically told like, Jim, we really want to test all the units with really extensive test benches. In the old school way of thinking was, oh yeah, sure. Recreate the whole design twice. You know, you have a load store unit, and now you have to make a thing to test a load store unit, which looks a lot like the rest of the computer, right? And but they were right, because actually, the code to test the load story unit is actually simpler than the rest of the computer. And if you put ten units in a row together, you know, fetch instruction by cache, decode renamed schedule energerics. You load store. Your ability from a program, to manipulate the load store unit. It's tough, because you're looking through five layers of really complex state machines, right? Whereas if you want to make the load store unit do all its things right from the load store pins, you can do that. So sort of like, I was sort of getting there, but, um, the verification engineers made the case it would be ways for them to write more code at the test bench level, and how modulars that we test? And then as soon as you think that, then you say, well, why is the interface between these two units have a hundred randomly named signals? Like if you've done detailed computer architecture, you know there's A-A signal called Stage Four, a fetch valid, except when you know something happens, right? That's not a verifiable signal, like And computers are full of that stuff. And the timing reasons for no good reason, for lowerly had to fix a bug. Oh, now this unit needs to know that this is in a state right where else. Computers have well defined interfaces, you know, instruction PC, you know, memory fetch, fill, exception, kill stall, like so. So there's a really interesting challenge of, like, how do you build a computer with well defined interfaces? And so you're a question like, whenever something, I made this comment and to talk, I did whenever something hard is cause there's too much complicated, too much stuff in one place, like you have to break things down. And sometimes the problem isn't whether the tools are there or not, but that you've tried to solve something you know in too many ways, like you have an RTL problem, a timing problem, a physical design problem, and it gets to be too much, and you have to really figure how to break that down and make it simple enough to do. And, you know, verifiable design, verify interfaces. You know, architecture thinking a little bit differently is important. Oh, yeah. I thought a lot about that And the uh, and then you can see it, you know, like some projects you have was assessing. And it's partly because you really took the time to architect the pieces up front and make them bury independent and brely clean. And you had discipline not to slowly turn it into a ball mud, which was a natural human tendency, apparen'tly?
Yeah, so that's, that's super interesting. I want to follow up a little bit with this ball of med question, because two things come to mind for that. So early on in your answer, you said something about how it's hard for large companies with a lot of legacy to to do this, and yet we do have a lot of large companies who have stayed alive for a long time. So I sort of wonder, you know, when I was a young engine I was like, how does anything work? I didn't understand how anything could possibly work. And then secondarily, this whole thing about, you know, I've seen the kind of signals where you've got, like, a signal that's called one hot selector for this thing, in the front half of the cycle, and the other one hots like for the back half. You know, it's, it's just a total mess. And then we've got, we do get these students coming out of schools, and and maybe some of them have never write RTL in their life, they learn all their computer architecture from reading, you know, boxes and arrows and stuff. So how how do you then form a team? More than you do have the discipline to avoid this ball of mud, where it's like, okay, no, we're gonna name these things, right? There's gonna be a reason for this signal, that this signal's gonna be, you know, eight bits wide, and we're gonna enumerate every single one of those eight bits with the proper name and proper state. Like, how do you, how do you push that out from where you are?
Yeah, so there's a couple of things there. So one is, you know, like a hundred percent of the Fortune 100 companies from a hundred years ago were gone, like, like GE still around, but a completely different form. So so companies do go through life cycles, and almost all of them disappear over time, like some get propelled for monopoly reasons or infrastructure reasons or something. So success today does not guarantee success, although the time of that is longer than you think. Most companies don't fail in five years. They fail in 25 or 50. So so that's a thing, right? And then Steve Jobs famously said, so you you have some new product, and then you make it a lot better, and then you refine it, right? But they get to their next level, you have to make another new product. And the problem is the new product isn't as good as the refined old product, but you can't make the old product any better. That's the best rotary phone that will ever be. The pushbutton phone is MS. It doesn't feel as good. It doesn't look as good. And you have to have the courage to jump off the refined high spot to a lower spot that has head one like this is a it's a quote for one of his random talks. So I recommend people go like some as Steve jobs's key notes were great, like, I watched a bunch of them, and he was very clear about what it means to design a product and then to believe in where you're going. So you have to, and it's really hard for marketing, in sales people in the big company. Then you go, hey, we've got this great new product. It's 10% worse than what we have today, but over the next five years it's going to be twice as good. And they're all like, well, we'll wait five years. But, but then you're not working on the right thing, so you have to do that. So that's one.
The other is, I had this funny talk with, uh, a famous big company who was providing IP, and they had this interface, which was a lot of wires and a lot of random stuff. And I told him, like, the interface is too complicated. And I'd go, yeah, but it's really, it's a small number of gates. And I said, now you don't get it. The wires are expensive, the gates are free. Right? So one thing you do is, instead of saying, I got this, you know, lean and mean state machine, and I export all the wires as you take that and you put it into a box, and you turn it into an interface, and you trade off. You know, at the time, I thought we traded off gates for wires, add a bunch of Gates because they're cheap. Moors law Viz, lots of gates, and have less wires. But a better way to say it is we trade off, you know, technology for complexity. Like if you go look at A DDR memory controller, e.g. on an AXI bus, so typical IP you can buy from, you know, three or four or five people. Eggs like transactions are very simple. There's like 15 commands that you mostly use, read and write, go at the control, or you say, I want to read 32 bytes a day at this address. That's a really simple interface. Inside the controller, there's a memory show You might have 32 or 64 transactions. There's a pending ripe offer. There's a little state machine that knows the current state of the DDR channel. Maybe there's two DDR channels. There's two dims per channel. DDR DRAMs are really complicated widgets, because they got a read, cycle, a write cycle, a refresh time. There are in different states at the re transaction. It's really easy read address, wright address data. You don't know anything about the state of them. Now, if you build a high performance system, you say, the CPU, it's going to be optimized. And we're going to send read commands to the DRAMs, and we know that we're going to have to sequence a read command, so we're going to hit it with this read, this rash early to get the dram so you can export the complexity of that. And then to CPU knows exactly on cycle 167 that there's the 1st piece of data is gonna come out. We suppose the cpuss that would wrap the return read transactions, so you got requested word 1st like like we, we had all kinds of complexity, but nowadays, the transactions really simple. Read or wright at the memory controller period. The data comes out at a random time. It always comes out in the same order. You don't export the complexity of the CPU to the memory control Now, partly, it's for a good reason. The cpus have really big, caches and really good prefetchers. They're running at three, five Giga HZ. And the memory controller late and sees a hundred and 50 nanoseconds. Like wrapping that transaction out saves .2 ns out of a hundred. It's, it's a dumb complexity, right? Though. You look in your designing to go, well, how do I get the complexity of an interface to be so simple? Right? And then there's a funder's, there's a funny one, which is, um, people always want to say, well, why don't we have an industry standard for cache coherence? Well, cache coherence is a distributed stateship. right? So now you're saying, all the people that you know Qualcomm are going to have the same spec as ARM as you know, somebody else. And that's a hard thing to do, whereas specs like eggs, I you know, there's a bunch of specs that are pretty commonly used that are really simple, and when you make them simple enough, then then many people can use them. PCI Express is simple enough. Ethernet is mostly simple enough. Um. So so you see these things that are common standards work on them. Like the 1st version of Infiniband tried to optimize latency and made a 13 hundred pitch back. And nobody could build a device to the infinite aspect, so they went through some soul searching and erratically simplified it. Then focused on being a generation ahead on fires and having a simple but good enough our DNA, and some other things, and then, and that became a product that people could use somewhat successfully. How how do you avoid complexity? Well, at the top, you have to decide it's a real goal, and then you're going to spend something on it. I'm going to put an extra hundred thousand gates in each interface, so that the interface is simple. And a computer with 200000 gates, that's crazy. In the computer with a hundred million gates, genius, right? Like, like, there's, there's a different calculation.
Yeah, there was a fascinating set of insights, I think. Uh, starting from the top, you talked about, you know, the need for new designs and the need for us to sort of revamp things from scratch, like every five years. I guess some of the ingredients for how do you enable doing that? One part of it is, of course, modular, clean interfaces, but also the discipline of ensuring that you have these interfaces that are simple, not too complex at various layers of the stack. Maybe I can double click on your experiences of doing this in the AI world, because that's one of the places where is a lot of need for this. clearly, because compute demands are growing unabated. Right at the same time, there seems to be a desire or or a willingness to try out an experiment with new ideas. And one could argue that at certain layers of the stack, we have seen some amount of abstractions forming, e.g., you know, matrix operations or conv operations, more generally, have been the bread and butter for deep neural networks. In this coming era, do you see that philosophy and that perspective sort of tricking up and down the stack? Because there is the operators themselves, but on the software side, once again, there's a lot of complexity once you put them into the hardware side. As you said, you know, you're still designing interfaces with hundred wires for something that is semantically just to read and write.
Yeah, so, yeah. So this is a really good one. And I'd say in Ai, we haven't figured out what the hardware software contract is yet. And I'll give you an example. So into CPU world. And this is not quite true, but this is close to true. Software does arbitrarily complicated things, like, if you go look at a virtual machine, javascript engine, like, it's amazing, right? There's really complicated things. And then I grew, like, when I learned to program my program assembly, like I used to know all the out codes for 65 out two in the 8086, and most of them for VAX. And then I learned C programming. C programming is great because it's in assembly language. You know, it's a high level assembly language. And some level, as an architect, you write C code, and you can see what the instructions will be generated, and mostly see how it's going to execute. It's pretty simple, but but the actual contract for a modern computer is operations happen on registers, period. You put data in registers, and then you do add, subtracts and multiplies on them, and you can branch on them. And then, from a programmer's point of view, there's a memory model where you load things basically in order, like if you load Address A and then you load it again, you never get, like, a older value after you get a newer value. So it looks like you have ordered loads, and then you mostly have ordered stores, like And there's weak ordering models, but they mostly don't work because you have to put barriers in the fix it. So so you basically data lives in memory. You'll load it with relatively ordered loads, you do operations on registers, and then you store the data out with ordered stores, right? And then there's a page model, paging model, a privilege model, a security model and the but those are orthogonal to the execution model, right? And that. And so underneath that simple model, you can build an out order computer. And it took us 20 years to figure out how to do that. Rule number one is you don't violate the execution model that the suffer people see. So vla W failed because they tried to violate the model. Weak ordering kind of fails because it violates the model. Like anybody, like people who did radical recompile to get performance, you know, with a simple engine, failed like Out orderss execution is really wild, because the instructions issue wildly out of order, but they don't violate that model. And to achieve that, we build registry naming, or something called kill. So you execute a bunch of instructions that order, and some of that happens, and then you flush all the younger instructions from the kill point, and you finish the older ones in order. Right? We have massive branch predictors, data prefetchers, but data. But no matter what you do, you don't violate the contract on execution. And that means that the software programmers do not have to be micro architects. Right? As soon as you ask them to be my architects, you failed. Itanium had like eight barriers. Nobody knew what they were for. Like I was in I was a digital When we built Alpha, we had a memory barrier, and we added a because we had weak memory ordering. So we violated the execution contract, and we broke all the software. So we had a memory barrier, and but they didn't learn to put it in. They the operating system, had a double M-B macro because they didn't know where to put it in. So two of them, some places seemed to fix some random bugs, like, I'm not kidding. Now, we added a right in her barrier, which we thought would make things better, and it made it worse, because they just put the right memory barrier in the memory barrier macwell, because I didn't know what to do with it, right? So it was like a worst case scenario.
So now look at AI software. So AI software has been developed mostly by programmers, and programmers understand the execution model pretty well. Data lives in memory. You declare variables, which gives you a piece of memory. Or you do something like malloc and free, which is a, you know, some kind of memory allocator on top of a memory model. But generally speaking, when you're in a you're in a program, you don't talk about variables as addresses. You have they have needs, and generally to do operate, you know. So you say, A will be time C implicit in that is, the load of A-B and C, they go in the registers, you do operates on them and send them back. And GPU today are sort of executing that model like you have lots of very fast HPM Dram data all that's in memory for every matrix, multiplied the data in memory, you loaded and the registers you do with operations, you write it back out again. So you're constantly writing in and out data. That makes sense. Now attend to aren't we believe that when you write that program, and you can see it very clearly in all the descriptions of AI, that program actually defines. It defines a day of flow graph. So if you go Google Transformers or ResNet, you'll, you'll probably get a picture, right? And the picture will be a grass and right? And the graph says there's an operation box where data goes into it, something happens, and then something flows out of it. And then they generally, they call the input activations in the local data for that operational weight. And the number of operations they do is actually quite small. Matrix multiply convolution, some version of, you know, weloo g lu, you know, soft max. And then a variety of what they call tenser modifications, you know, where you be a shrink or blow to matrix, you pivot it, you transpose it, you convert tuding to three D there's a bunch of tensor modifications, but the number of operators in at spaces low. And then people are styleized on things like, you know, how do you exactly program relu like, there's some implementation methods. So so it's interesting that the programmers are writing code in Pietarch to a programming mall that looks like standard programming. They're describing what they're doing in terms of graphs, because that's, that's a nice way to think about it. But the code itself, you see, is a mix. So the challenge is like, how do we come up with a programming model that we all believe in and understand, that can go fast and not have to do things like read and write all the data, the memory, all the time, because some operator expects data to be a memory, and the only good way to do it. And then I've talked to, you know, AI programmers, were like, I'd happily reco that to make it twice as fast. Like, that's one view.
And the other of you is, I really don't care, because all the upside on this is either sides bigger weights, bigger weights, more parameters, and the hardware is going to make it faster. And in the short run, I'll just buy more processors. Though, there is a really interesting dynamic about us, and it sort of feels like, you know, when we 1st started building out border processors back in the I guess I started working on it at 95. The kid had been around for a while. The IBM 360 did out of our execution. The 36091. I think it was, it was amazing. But when I was a digital there was a, there was a debate about whether you could actually make an out or a computer work, right? And there was, you know, the competing ideas where, you know, there were super pipeline, super scale, VLIW out of order. And then little window, big window. There's a bunch of ideas about it. And what one like, clearly, I think. But you know, there's still some people who are debating this is, you know, out of order machines with big windows and really well architected reorder buffers, renaming and kill in our faces, like that's that works, right? And a really simple programmer that works. So the interesting thing is, so that and the GPUs, like some people tell me, while gpus just work, but in various thousands of people hand coding low level libraries. It was a really good academic paper. Had said, hey, I decided to write matrix multiply, and I wrote it, and could of the obvious way, and I got five per cent of the performance of nvidia's library. And then they did the obvious HPC transformations of transposed one of the matrix stub block for the denown register file size, and they got 30 %. And from 30 to 60 %, they start in a hack, they know how big the restrifile is, how many execution units are, how many threads, how many you know. And the individual library has, you know, they have what they call couton inches who are great programmers who know how to make that work. Now that the charm of cuda you write a cuda programm, it will work. The downside is that the performance may be randomly off by factors a lot. But when you're writing your code, if you say, well, why would your wright matrix multiply on the Gpu? There's a big library for that, like So they're a library. So they have a, they have a program all that works in libraries that mostly solve your needs. And then and now you're arbitrage in the last 10-20 percent, but that computer doesn't look anything like the way that actually, I programmers describe the program, right? And so that's a really interesting thing. So we're building a graph compiler, so our processors in array of processors, which have some low level hardware support to support data flow and some There are some interesting methods about how you take big operations and break them up to small operations and coordinate that. And the charm of it is it gives you much more efficient processing and less data movement, less reading the right of memory. There is interesting things think of. Like. So you say, if I have the ram big enough to hold all the data I ever need, it's a big ram. There's lots of power. So if you break the ram into a small spot, smaller thing, it's much more efficient for access. But then you might have to go to other ram. So there's a, there's a trade off between the ram spots. And a major multiply has this curious phenomena of for a, you know, an end by end matrix. It's in squared data movements for n cubed operations, right? What's the sort of wide what AI works like? Like AI As you make the operations bigger, you get more computation and data movement. And then there's ways to optimize that further by breaking the big operations into the right side, so they're big enough to have a good race of computed data movement, but small enough to be efficient on local memory access. But it's very much like you can see, all day, I start up, start taking different approaches at this. And it's not because people are, you know, trying to do something different. It's because there's, there's a real problem there, which is, you know, how the programs are written, how they're described, what they look at, are very different things. And, uh, like, it's technically interesting. And I think the solution will be much better than, oh, we'll just keep scaling faster memories forever. Like that. Doesn't seem like the right approach.
Yeah, I think that's a fascinating set of points. I do want to expand a little more on the AI hardware software contract or execution model that we know of in the hardware software. And typically so one of the attributes, at least of the state of the art models today, is like, they require a lot of skill that you have chips that interconnected together, you scale them or to really, really large systems. I wanted to get your perspective on
Well, actually, there are really small systems, so I think you have your metric wrong. So the human brain seems to be intelligent, and people estimated at 1e+18 to 1e+22 operations, depending on who you are. Right? So GPU is currently, you know, 1e+15 operations, a second. So it's off by something like six orders of magnitude. So we have a computer about this big, right? Which is an average, you know, intelligent operation computer, right? And then to build that today with GPU would take, you know, a hundred thousand GPS or something, which is like, the problem is in the GPUS side. So we say, that's big, but it's not that big. Well, big compared to what? Right? That's the funny part. Like, it used to be it took a really big computer to run a simple fortran program. And you could say, that is a big computer. But now that computer, if it's on a half a millimeter a silicon. So the fortran computer of the seventies is, you know, .1 mm?. So moore's Law fixed it. So size is a relative thing. So today, yes, to build a big training machine, you put together a thousand GPS, and it's, it feels really big in the parameter count. It's like, you know, 30 billion parameters. And there's, you know, a peta byte, the training data we want our uncle, and whose numbers are so big. Here's a funny number. So a transistor is a thousand by a thousand by a thousand atoms, 100 nm. Just think about like a seven nanometer transistor, called seven, but it's about 100 by 100 by 100 and a hundred nanometers, a thousand transistors. So that's a billion atoms. And we use out to resolve a one in a zero. Now we resolve a one in a zero at about a gigahertz, which is pretty cool. So it's a billion. One's in zeros per second out of a billion atoms. So it's a billion, big number, small number. I don't know. Machines look big, but, like, they think the computer and iphone would have been like, you know, ten cray-1 computers, which were big in their day, and now we think of them as a $20 part that, you know, fits in a three wat envelope. So it's a relative measure. And AI programs today are big compared to traditional computing. They're small compared to actual, you know, most average, intelligent people,
I hear you. I think, I think it's a fair point. The intent behind the question was more to say, um, you have things where you run things on a single chip, like, traditionally, you know, we had this, uh, when you were building a chip, we had a clear execution model and contract. And how these chips were hooked together was a separate problem in some sense. Like the distributed computing programming, if you went to databases, they had their own set of protocols, their own set of execution models for how database transactions would execute, and so on. If you went to the hitch PC world, they had a different set of execution models in contract like Fortel or Ai. In particular, do you still see that we can have this separation? Do you think that doesn't need for a more unified view across the chip level boundary to the system level boundary as well? Because, uh, you know, you have various forms of paradalism. Um, yeah.
Well, the fascinating thing about, like, a current, like a thousand ships, you know, ML GPU computer is 1st there's an accelerator model, there's a host CPU in an accelerator. Then in the GPU itself, there is a memory to memory operator model, right? And then the that node runs some kind of networking stack between multiple nodes, and then it's coordinated with something like MPI. So you have a memory model, an accelerated model, a networking model, an MPI model. And so to make it all work, you know, this is before you even run a program like it's, it's kind of amazing, right? And you can look back when, you know, when processors had FPU accelerators, the FPU had a driver, right? So you had to send operas to the FPU and poll it. But when after you got integrated together, the floating point just became a data type and instruction in a standard programming model. So the accelerator model occasionally disappears. So there was a, you know, as floating point got integrated, there were still vector processors, which were accelerators for vector programs. And they died, essentially because the floating point got fast enough that it was ways here to just have a couple more computers running floating point programs than it was to manage its accelerator driver models. So the current software structure is like, I would say, somewhat archaic and complicated, but it's built on well founded things like GPU accelerating graphics programs have been solid for years. There's lots Everybody looks at and goes, man, there's a lot of memory copies there. And, oh, the programming model, the GPU, too simple. But, you know, that's 20 year old model. And networking works, and MPI’s been used in a for a long time, and it's pretty well, you know, fleshed out. But the fact that, you know, to run in AI program, you need something like four programming models before you even write a pytorch program, it's, it's a kind of a amazing And even the pytorch doesn't really comprehend the higher level thing. So the they're running locally on nodes, the MPI coordinator thing. And yeah, it's fairly complicated. Now, if you had a really, really fast computer that run AI, you know that those layers would go away. And but we don't have that computer yet. And and that's where the, you know, the excitement happens at the, you know, what's the right way to think about this stuff? And it feels very much like we're, we're in a transitional place. And we've been through these before, like that change from in order computers to super scalers, super vector, you know, out of order, the VLIW word, all that took, like, 15 years, and we're probably in year three of this on AI. Yes.
So, so where do you feel, um, like it will land cause it? It seems like one of the tricky parts with AI questions is, you know, there's, like, you say, you know, from the programmer perspective, there's a data, full graph of what they're trying to do. You know, here's this tensor. Then you want you to send these things here, and then we have this hardware that we want a build to do it fast. And then, you know, the in video solution is they have this middle where where they translate that high level data photograph into some really low level libraries, so that they can make sure that it's fast on this particular piece of hardware. But the question that always seems to come up is like, how big should you know, we don't want to have a huge DRAM as you say, like that can handle all of the memory. That's like in one giant chunk. We don't necessarily want one single matrix multiplier that can handle the very largest matrix multiply you could ever imagine. You want that broken up. And so then how they should get sized, how they should then communicate with each other, and then how, in the end, they would get, you know, how, how condensed down, all sent And maybe some some small process that's actually doing, like the relu or something like that. Like, the question is always coming up about size, and in that sizing is often really dictated by the current state of the art, which is not going to be the state art in like, six or eight months. So
you ask a bunch of questions. So 1st AI is like, the capabilities are changing really fast. But the models, there's been a couple, you know, there was obviously alex Net, and then, you know, Resnet, which is a huge refinement and an uptick on that. And then the language models came out with transformers and intention, right? And then they, they, they had the, you know, the, what's the, the bitter lesson, like size always beats, you know, cleverness. So there's something interesting about there's a certain stability of that. They're obviously learning tweak. There's a bunch of tweaking going on, like, how do you tokenize the day to how do you map it into a space? How do you manage you're training? There's a whole bunch of things going on there. Um. But it's over the last couple of years that's that's been somewhat stable. The transformer paper came out, you know, how many years ago? Four years ago, right? And we're building way bigger models that are much refined on top of that, but that that stability. So there's a, there's a new benchmark every six months, and they're heading something called benchmark saturation. They say, you know, like, hey, we have this huge set of images. You know, how good does the AI recognize it. And it went from like, 20 %, you know, accurate, to 50, to 80, to 90, to 97. And all of a sudden that those benchmarks are saturateable, right at a hundred percent, you're done. Like it doesn't say whereas a lot of CPU benchmarks are, how many floating point operations the second can you do? And twice as Twice as many is always better. So so some of those things, like there's a bunch of, like, natural language tests and math tests, those are sat tradeable benchmarks, because you can get all the answers right. And so they so they've been in this kind of churn of these benchmarks are going to be good for five years. And they saturated one. So that's a, that's a funny thing. But let's talk about size. So at the high end, our sizes are large compared to our technology, but small comparative to need, I would say that's one thing. And then let's differentiate memory size memory capacity. So if memory capacity was big because it stored a lot of useful information, that would be really interesting. But if it's big because it needs a lot of place to store intermediate operations, that's kind of a drag, right? So so technology will so architecture models and technology will move to the point where you don't need memory to store intermediate operations, like modern server chips. The cache are big enough that the memory accesses should mostly be for 1st time you needed the data, not like there's a big working set and you're reading and writing the drams over and over to do a matrix multiply, that would be a drag, so so that for in that case, the caches should get bigger, and then the matrix multiply should be structured so that you can do blocks. And so that kind of behaviors well understood. So large memories for holding a trillion useful these bits of information. Seems like a fine use for large memory. Eight terabytes bandwidth, because you need to store intermediate operations. Seems kind of crazy. So there's a couple of differentiators you could make. And enters the observation that the brain doesn't look anything like a large memory that you're reading and writing. So, you know what a corical column is? It's, you know, ten or a hundred thousand neurons organized in a set of layers, and are verily, densely connected together there. And then they talk to each other at relatively low bandwidth. So that looks like an array of processors to me, with local storage and distributed computing and messaging. And it sort of looks like the graphs people say that they're building when they write AI, which is why architecturally, an architecture that embodys data flow and knows how to do graphs and knows how to pass intermediate results, instead of having to store them all the time, it seems like the natural thing was that clear like large memories to hold large numbers of things. Yea. Our current memories were small compared to the needs large memories for enemy to results. That seems like an architectural anomaly. We've been through this before. You know, HPC machines that used to have, you know, tall about memory ban. That's right, that's That used to be memory ban with memory ban with, you know, run streams, run streams, right? So then you got a hundred processors with a hundred megabytes of on chip cache. And we started to hear less about that, because more and more problems were factored into dense computation and sufficient on chip storage, and normally starts to be storage for large data sets. Now, it's not always true, and there's, there's a bunch of problems are very hard to factor that way. And there are some interesting things about very large, sparse data sets and unpredictable data sets. And though the HBC guys still have limits everywhere they look, but like it's not as clear a cut as it used to be. Show me the dram bandwidth and I'll tell you the performance of the computer like that's it's more complicated.
So does that mean, in some ways, it almost sounded like, in terms of, you know, the sizing of the structures inside of an ML computational engine, that that sounds to me like, you feel like that that's kind of stabilized, and that's relatively solved. But then we have all these AI startups that are trying to build, you know, hardware and the software stacks on top of them. And you mentioned before, they all have their own different ways of doing it. So there there is still, you know, if the structures themselves are largely stable now, because there are some now primitives.
Let me be clear about that. So my point was, it's moving slower than people think. So the results at the benchmark level, and some of the the tweaks and stuff, are moving quickly. The current set of structures have kind of gone through two or three generations, which are somewhat stable. But there there could be a new structure next year that changes everything. So I don't think it's, it's not reached a plateau stability, like, like out of order execution house, right? It's still, it's a interrupted equilibrium, right? Well,
I see, okay, you waited equilibrium.
Let's say. So, like, when Pete Ben and I, uh, we work together on the Tesla chip, we used to wake up every once in a while and say, what if the engine we just spent a year building doesn't work at all for the algorithm that to come up with tomorrow? Right? Like, that's a real that's to keep you up. That's a wake up at 04:00 in the morning. But it's, it turned out there's always been methods. They did come up with. Algorithms don't work on that engine, but they found way to transform the one algorithm to the execution engine, and they've had success with that. And they did get a huge power and lost savings by building A-A really focused engine as opposed to a general one. So, you know, that was a net win. So so we're in a state of punctuate equilibrium, relatively stable for a while, but you have the sense that things need to change. And then the description of the software, and other people write the code describe the software, and what that execution engine is. The fact that those are different is really, you know, curious, and, you know, invites like, let's say, innovation and thinking. And the prices aren't stable because people are pushing sizes right now, because most things would be better if they were ten times bigger, like some asmatotically so, but some, there's some some AI curves that are just still going, like you make up ten times bigger, and you're still getting better at it at a real rate. And that's that's why I think there there probably will be some really interesting breakthroughs in the next five years about how information is organized, and how the how to do a better job of, you know, representing, essentially meeting and relationships, which is what AI does, right?
Yeah. Just before we closer, this particular team on the topic of future breakthroughs and so on. As reflecting back on the progress of AI, we talked about a couple of things. One is how graphs seem to be our data. Photographs seem to be a very good abstraction, uh, to sort of express computations in build systems. On top of you mentioned a little bit about architectural anatomies that we should probably fix, like, you know, these in large intermediate memory operations and so on. But moving forward, as he looked forward to newer breakthroughs, AI. Are there any opinionated bets that, like you're making at Tenstorrent and that you think, uh, we should be looking at as a trend in the future?
Well, there's a couple things. One is, you know, some people observe this, but when it 1st came to me, it's like, so, you're taught at, you know, AI's, you know, inference and training, so inferences, you have a, you put an input into a train network, you'd get a result. And then train me something like, you have some data with an expected result, and you put it in, and then you get an air you back propagate the air, right? And when somebody explain how they train language models and some image models, you basically take a Satins and you put a blank in it, you can run it through and guess the blank, which, as I think, is really amazing, um. But to do that, you do the Ford calculation, you save everything. And then on the back propagation, you use optimization methods to look at what you calculated in what you should have calculated, and you update the weights. So brains clearly do not save all the information that they're doing on the forward path. And then there's some cool papers. There's one called Rafnet, which, you know, is like a reversible resnet, so you don't save the animator results, you calculate the backward pass, which is cool. So it seems like there's going to be breakthroughs on how we do training. And also, when humans think we don't train all the time, like uh Ilia at open Ai, I said, when you do something really fast, it only goes through six layers of neurons. You're not thinking it's trained. That's inference. Everything you do really fast is inference. And then the really interesting thing that we humans mostly do is more like generative stuff. They have some set of inputs. You go through your inference network that generates stuff, and then you look at what it produced in, what your inputs are, and then you make a decision to do it again. And you're not training, you're doing these cycling inference loops with you know, that part of your mind is sort of your current stage of understanding, which, you know, you could say is your input tokens, but but it's decorated with, like, what you're trying to do, what your goals are, what your your history is. And in every once in a while, you're thinking about something, you go, that's really good. That's good. And then you'll train that. So we have humans and multiple kinds of training. We have something exciting happens. You remember from your life, from one instance, right? So we have a method for training, like doing it a flash, remember exactly what happened. And then we have procedural training, where you do something repetitively, and you slowly train yourself to do that the automatized way. And then we have the thinking part, which is like generative learning, where you're stewing on it, you're trying as you try not, and you find a pattern that's superior to anything else you've thought of. And then we train that, because he used as the building block for the next. So humans are generative malls, and people are there's a lot of innovation, and they call prompt engineering, and there's all kinds of things, but the structure of it, it's almost like it's not philosophable enough yet to be thinking the humans think in terms of, we have overall goals. We have moral standards. We have stuff our paren'ts told us to do. We have short term goals, long term goals. We have constraints by our friends and society. That's also our present when we're doing our daily tasks of whatever we're trying to do, which is mostly not instantaneous inference, and it's mostly not training, though. I think that's a really interesting phenomenon. And then the fact that these big generative language models are starting to do that really, really curious. And then thinking about, like, how would you build a computer to do that better? Like, that's A-A, really interesting phenomena.
Yeah, no. Speaking of, you know, humans intelligence goals and building computers, maybe this is a good set way into the other team we wanted to talk to you about, which is, uh, you know, you've been at multiple companies. You've built lead and sort of nurtured successful teams deliver multiple projects. For her to get a perspective from you on how do you think about building teams, noting them, growing them and scaling them, especially from plants of building, you know, hardware systems or processors and so on? What have been your key learnings? How do you view this?
Yeah, problem. So, you know what the words creative tension mean? Right? Where you hold opposite ideas in your head, and then there's tension between them. You know, I want to get ahead, but I want to goof off this afternoon. This great attention, right? Like everybody does that, so I partly think so. I'm a computer architect. When I 1st started managing a big team when I went to AMD and I guess 2000, while were something like that, like I was working an Apple, I had one employee, and I wasn't managing them particularly well, and then I was going to manage 500 people. And I grew up 2000 or something, so I realized I could treat organizational design and people as an architectural problem. On the computer architect and people, generally speaking, have some function that they're good at, and then there's inputs and outputs. Everybody knows how that works. You write a box with a function input output, and one of your emissions as a computer architect is to organize functional units in a way that you get the results you want. So if you're trying to build a computer, you need to break down with the architecture of the organization that solves that problem. And modern processor designs, there's architecture Group, architecture Group, RTL, physical Design and Validation. And then people, you know, for, you know, probably evolutionary reasons, operate best in teams of tender, smaller victor's a manager, there's ten people, and a really good team of ten people will outperform ten individuals, like humans are designed to do that, a bad team will underperform him. And there's all these jokes about, as you add people, productivity goes down. But if your teams are well designed and your problems well architected, team people love to work. And, you know, teams, they like, you know, five to ten people working together, They're happy to see each other, up to like, 50 people, like, people all know each other pretty well. You know, at a hundred, it becomes difficult to know people. And you start, you start needing boundaries, because humans tend to give strangers as enemies. No matter what you say, you could be nice about us. So that's where, you know, a director will manage like a hundred people, so directors know each other, but the people and the directors teams don't need to know each other. So so there's a, there's an organizational dynamic that you need to figure out. And then and enters the people sides. So people love Engineers love what they do. Like, that's one of my givens, because engineering is way too boring and hard to do it every day with, you know, excitement if you didn't really love it. Like, people are willing to do hard, boring things if they like what they're doing. And people who don't love engineering leave it cause it's, it's, it's actually hard and annoying and repetitive. There's a bunch of stuff about it, like, I think about the same problem over and over and over and over. So people love, you know, so engineers generally like what they're doing. They have to, or they couldn't do it. And then but they, you know, there's this interesting dimension of they love to do things they own, but they all don't always know what the right thing to do is, right. So so you need to have some like hierarchy of goals and steps to do and processes and methods and wave people interact and motivate each other, because you're trying to get that creative tension spot between them. They own it and they're doing the right thing, but they're still following some kind of plan or organizing together, and that's that's difficult. Does that make sense? That of creative tension between organizational design, you know, ricks, requirements, and then let's say the human spirit, which is, you know, like people who are excited do ten times more work than people who think this place sucks, and, you know, I'd rather do anything else. So there's a huge swing. And then teams that are working well together, great stuff that individuals can't
two quite follow up questions on on that theme. Because I feel like, yeah, I'm sure, um, cause I think one thing, you know, as I have a transition from being a young engineer to A-A less young engineer, let's call it, you know, with the that 2nd piece of, like, constructing teams and having a clear sense of, like, what what you own, and how you solve your problems in having everybody kind of, you know, autonomous. But marching towards the same direction is like one of the hardest organizational problems, it seems. And so I think I saw once you said something like, people don't like to do what they're told to do. They like to do what they're inspired to do. But one of the things that I've witnessed across your multiple organizations and multiple groups is just that, just getting everybody to feel like, okay, this is what we're doing. We've all agreed on it. You own this, and you own this. I mean, that's like a one of the hardest parts. So that's one question that. And then the 2nd question was, you said something about groups of ten. How is your feeling about, like, remote work these days? I know they're very different questions, because of the thing that popped up.
Yeah, yeah. So I don't know what to think about remote work, cause I I'm not a fan. I like to work with people that said, I've had very successful projects with teams in different places talking to each other, but I've also seen people working remotely on slack talking all day long. They got their, their zoom chats up and running, they're talking, and it's almost like they're working next to each other. So it there's a lot to figure out about that one. Um, your 1st question. So it really helps to be an expert. So I've led some projects successfully, and I'm a computer architecture expert. I'm not an expert in everything, but I've written CAD tools, I've arched detective computers, I've written performance models. I've done transistor level design. Like I have a lot of capability. And then I'm also relatively fearless. Asked some questions. So if them in a room and people explain something like, young people, please listen to this. If you don't know, ask a question, people don't want to tell you the answer, Go work somewhere else, like go figure out what's going on. Somebody followed a complaint on me one time, because it was a senior VP asked too many technical questions, because they they were used to walking in the room with bullshit powerpoint and bullshitting for an hour about progress. And the page one, I was like, what the hell is what's going on here? Explain, you know, sentence one word one doesn't make any sense to me. Explain it. Nobody could explain it. And so you can imagine word two. It wasn't making things better, right? Somebody said, you run fastest when you're running towards something and away from something. And I am more than happy as a leader, to have a vision in and lay out what I want and work with people to get there. But I'm also more than happy to dig in everything and like, does it make sense? And can you do it? And you say, well, that doesn't scale, but apparen'tly it does. I worked at Apple, and Steve Jobs had everybody, like, on the balls of their feet, working on shape. Because I knew, if Steve found out you were screwing around to be held to pay elon, does it I-I watched him, you know, he motivated very large numbers of people to be very active, hands on, technically ready to answer questions about what they're doing. No bullshits and slides. So you need to have a good goal. You need to factor it into something. And people say, I get it, I believe it, and I could do that. You need to have confidence in the management structure. My team on Zen that the managers were very competent. They were all technically good. They were good managers. Because, you know, people do kind of divide into when you wake up in the morning, do you think about a technical problem or a people problem? Right? So I'm a technical person. I wake up thinking about technical problem, but then I want to solve problems that take lots of peoples. I've turned people into a problem. So I read a bunch of books on psychology and anthropology and an organizational structure and search of excellence, and, you know, you name it. And then I came up with a theory about how to do that. And one of my theories is I like to have managers work for me who are technically competent, but good people people. And, you know, that helps offen the edges around, like, say, me, E.G., or the problem, or the company. Like when an employee does work, they have technical problem in front of it, when they have their organizational problem, have their boss might be a problem, that's a drag. Like, the company might be a problem. Competition might be a problem. You know, it can be tough, right? So so people need somebody to look after them, take care of them. Inspire them. You know what? At the same time, you have to be doing something that's worth doing and balancing that out to So, I mean, like this is a huge space of creative tension. There are certain, you know, leaders that are really hard. I think they're too hard. I think life's too hard for a lot of people. I'd look for ways to solve organizational and technical challenges the way that most people fit. Kennell sent a digital said, there's no bad employees, there's only bad employee. Job matches. When I was young, I thought that was stupid, and somewhere around 45, I decided it was was a pretty good thought. If somebody is a good person, there's almost always a job that they can contribute. Now, if you're in a financial downturn, you have to lay people off. You lay off people in certain orders like people know that. But, you know, solving the problem for peoples is important, because I've seen it turn into a really positive results in and organization. But it's, it's, there's multiple dimensions. Like somebody said, well, what's the way to do it? Well, well, you know, is your goals clear? Well, a lot of people fail right there, and goals are clear. Like there's this organizational infrastructure called Bulls, organization Capability and Trust. You have to solve all four of them. Goals. Clear. They doable. Yeah, does the organization serve the goals? Like, if you, you know, the processors broken in the six units, do you have six good leads? And inside each unit is man capability? Do you have the technical ability to do it, or can you identify the problems, really, and now people who are possibly able to solve those problems? The capabilities is a big one. And then, and trust, disorderly, complicated sentence, because it's usually the output of the 1st three, not the inputs of some manager says, we're going to focus on trust and execution. Those are the outputs. In the world of input function, output, you can't change the output by looking at the output. You can change either the input or the function. The output is the output. The output changes when you change one of those to. So any manager says, what are execution, execution, execution, unless they're doing something about it? Are they doing something about it? Or are they training people? Are they hiring for talent? Are they reviewing people properly? Do they buy new CAD tools? Like, what did they do to make execution better? If all they do is say the word execution, then they're bullshitters. So you have to solve multiple dimensions, and you can't solve one of them. And there's a bunch of places underneath that where there's multiple dimensions. And then that's when you start the really see the difference between, you know, great leaders and get projects done. And I work with some really great leaders, so I'm just amazed, like, model three got built so fast, with so many people across so many dimensions. And, you know, elon was super inspirational and unbelievablely good at details, but Dugfield built and stopped a really wide ranging organization, and I watched and do it. I was there, you know, when we built an auto pilot and drove a car in 18 months. But compared to, you know, building Model Three and shipping it, that was relatively small potatoes. So it's a, it's a, it's really interesting to look at these things, and then you have to take them seriously. And then yet, I realize, no matter what, you don't know that much. And then, you know, then you have to dig into it. And then, if you're lucky, you find the right people and you get the right place. But yeah, it's, it's a hard problem.
An engineer probably should read way more books. And people always ask me, well, what three books should I read? And I think, well, I'll write a thousand. So the three books I like the best only probably like them because of the other stuff I knew. So I have a hard time recommending the book to win the office, but reading lot can help.
Yeah. That's one thing that's always mystified me about some of the higher level leaders that I've worked for when they talk about all the books that they read. Because in some ways, you know, when you're making, when you're a junior engineer, you're just like, okay, I'm just going, I'm going to do my job, right? I'm going to write my code, I'm going to do my module, I'm to run my unit test, or whatever. And then as you make the transition, it becomes much more like, okay, there is more than just technical things to getting this stuff done, and then making that transition to spending your brain power on the other part, and then making sure that you're spending it appropriately. It's a little bit of a tough transition, because there's a lot of comfort in your boxes and arrows, right when it should be like this. But then it was like, well, how do you get everybody to believe that it should be like this? And how do we get anyone here…
you are between 35 and 45, you realize almost all your problems aren't technical.
Yes, you know well.
And then, unfortunately, it's a little like when you train a language model. You don't say, what's the $0.10 is? I need to train this model? No more. Actually helps, right? And, and a lot of times, a really great book is only a great book because of the other hundred books you read, because it's the one that brought you the ideas together. And if you read at 1st, it wouldn't mean anything. So, so quantity kind accounts, I'm not afraid. Like, I frequently read a book, and I really, but so most peoples write books have 25 or 50 pages and stuff to tell you, but the editor tells you to write to 200 pages, because that's what sells. So so don't be afraid to read 50 pages and go, I got it. He seems to be repeating himself, and almost all writers, once they start to repeat himself, they don't bury some really good nugget under pages later, right? Once they start repeating themselves, they're disreputing themselves. Because peck people passion up. Write a book about engineering, management, idea, inspiration projects. They pour their heart out until they're done, and then they fill it out until they get the 200 pages. So don't be afraid to throw a book out after 50 pages. Now, I read this book, uh, against method. It's my current favorite book by Paul fire Abbot. And the goddamm book was like 300 pages long, and he just had one idea. Afternoon, I kept waiting for him to start repeating himself so I could put it down, because it was too dense. But here he didn't quit. He just kept writing all the way through the damn book. And it was pretty funny. But yeah, it's a real thing. So if you start to realize that there's more to work than just engineering technical problems, you've reached the next level, which is good, and you should saw it, cause it'll be, it'll make you happier, and you'll be more successful if you do. And you may conclude that's really cool. And now I'm better at managing the team, and I can focus on technical stuff. And now we want to manage bigger teams. Or now we want to go into sales, or or now I get it. I really should spend more time surfing. Like it's all fine. But when you reach that point, it's a good thing. It's a tough thing. It's harder than college. Harder than college, yeah, college, there's answers in the book for the most part. And then, you know, when you start to work, you realize that they, you know, if the answer is in the book, you don't get paid that much. But then we start to try to solve this next level problem. There is no answers at all, but there are some solutions.
So maybe this is a good time to bind the clocks back, and you can tell our audience, like, how you got into computer architecture? How did you achieve the, you know, employee to job match fit in your How did you get interested in the field? And how did you eventually get to, you know, beggar?
And I would take control further random, because in college, like I basically goof brown in high school, like I see kids today studying for SATs. And I still remember, you know, being out with my buddies, thinking, I should have to take the SATs tomorrow. I should have probably not stayed out all night and then, um, but I got to college, and I done well enough in high school, and I'd like math and physics and a few topics. And I, and I went to Penn State, and I was a combination electrical engineering philosophy major. And, but it turns out I can't write at all. And in my sophomore year, the head of philosophy department sent me a note, like, I think through the electrical engineering department, and he said, I really wanted to meet you. And it was like, yeah, it's great to meet you too. It's really wild and expect you know, I just taken four philosophy classes. And he goes, yeah, we we notice that you're a philosophy major. And then he pulled out, like, a paper written by a typical philosophy student for like a midterm, ten pages, you know, nicely written perfectly, you know, perfect sentences and everything. And then he had my page, which was like a half a paragraph with scratches out and words in the market. And he said, he said, jim, you're never going to get, uh, philosophy degree Penn State like you're He said, we're happy, we like you in class, but we write a lot and you can't write at all. And I was like, oh my God, really? You're kicking me out of philosophy. I didn't even know that was a thing. So, but, but Penn State was great. We had a two inch wait for fab. I made wafers. So in college, I thought I was a semiconductor, you know, major. My adviser ran that lab, and I learned a lot about that. And I took a random job to build fiber optic networking controllers, because I want to live in Florida near the beach. And while I was there, it was a terrible job at a great experience. Somebody said I should work a digital equipment. And they gave me the backs eleven 77 80 paper manuals, which I read on the plane to my job interview. And I thought that was really cool. But I went in there and I had a lot of questions. So I met the chief architect at 1170, in the Eleven seven 80, bob Stewart, it was a great engineer, and I had all these questions, and he was, he thought I was funny, and he hired me as a as a lark, I think, cause he knew I didn't know anything about computers. I literally told him I just read the book on the plane. I'd taken one Fortran class in my life, and it didn't go well. I spent 15 years of digital and that's where I learned to be a computer architect, mostly working for Bob Stewart at to start. But there was some other guys, dug Clark, dave Sager, like there was a couple legendary people there. They were really good. And I had the opportunity to be to work with them. And I was fairly energetic as a kid, so I just jumped in the stuff, and I learned a lot. I slowly learned computer architecture, like in a Pete Pete band and I work together, starting in 1983 or two, or something like that. So we worked on a back 80 800 together a couple following projects. The 1st, the 2nd alpha ship, gave me five and wrote the performance model for that, you know. And back then, you read papers, you know, sometimes. And then he did hands on work, you know, it was really good. And I, and I got a chance to go into lots of different things. I wrote a logic simulator, a time in verifier, several performance models. I drew schematics. I wrote a little bit RTL. Weirdly, not that much artill in my life, but some cause, that's, that's the main method now. But when I did it, like I used to have new Carno maps and old Colin took a schoolball stuff, and nobody does any more. Yes, the digital equipment. I was there about 15 years, worked on some very successful and some unsuccessful products, which is, you know, our 2nd, or follow on to the Backstadian Hundred, was canceled for partly design reasons and partly political reasons, and it was super painful to realize. And then digital itself went out of business right when I left, and it got sold to Compact. So, you know, like, you get five ex deckeys together with a beer while I'll start crying in about 30 min because it was a great place to work. And, you know, a surprising, you know, disaster, let's say we were building the world fastest computers going out of business at the same time. So the combination of, you know, products, market management, let's say, business plan like digital didn't transition the business plan. And by that had been captured by the marketing people to the extent that they thought raising prices on VAX was the wind as pcs and board stations came out.
So that's one thing I try to sometimes tell young engineers or interns that come in, you know, that a lot of success in any business does not have to come down to the nuts and bolts of the technical stuff. And in fact, often doesn't. And so at the same time, though, it's really important for the people who are coming up, as you said, to know a lot of these nuts and bolts. Right? So it sounds like in those 1st 15 years, that's where you collect a lot of experience and knowledge about the whole process, end to end, a how to build a good computer. So we want that. But then, at the same time, you know, this understanding that, you know, you might be, you know, betas might be better than, um, beta Max might be better, but they didn't win, you know, decks. I think every ex deck person I know, like you said, like, that was wonderful experience. But then, and really sad by how it ended, how have you, maybe, it sounds like maybe you've, at this point your career, been able to sort of bridge where you can have a good engineering service, good build, build good teams, and then somehow also translate that into successful
Yeah, it's got to kind of complicated. So 1st, like, like my 1st job, I work with some really smart people out of college, but everybody hated the company. And when, then when I went to digital, everybody loved digital. Like a friend of mine's partner said, what are they put in the water? All you guys will reduce like we'd work at work, and then we go out drinking and talk about work, and then go work. On Saturday when you're young, like there's two really important things. One is, are you working for somebody you can learn from? And make sure you learn, ask questions, try projects, get good feedback. And also work in an environment where it's interesting. Now, you could be in a really good group in a failing company and get a great experience. But generally speaking, companies are happier if a year in a growth market, and less happy in a shrinking market. You can kind of pick and choose. But right now, there's some very big, growing companies which have sort of lost the plot on engineers. And engineers to get lost. You know, they hire a hundred interns and they put them on random stuff, and they don't learn anything. So you got to be careful about that. But you know, a positive environment with somebody you can really learn from is really important. But then when you get the bigger stuff, it's like, how do you build a successful product? Like I went day indeed, knowing that they were a failing company, right? And I thought part of the fun would be, now, could we turn it around? And I work for a Roy Reid, who was, you know, he was very clear, like he said he didn't realize their going bankrupt until he got there. He looked at the books, and then he said, I'll save the company. You guys built the products. The Raj Gadore and I were the architects of turning around. He be using graphics. But we had some really good people, and we had organizational problems, and some, you know, a relatively small number of bad managers and bad people, you know, job fits. And, you know, we made, we made a bunch of adjustments somewhere, pretty visible, and some were subtler. But I was really working hard on, like, how do you get the negol's organization capabilities all lined up? And I believe that, you know, trust would come out of that. And and I had a really great consultant working with me then, cat Row, who who gave me a like a nonstop stream of books and articles and stuff to read and brainstorm with me a lot about how to be successful. So I invested in, you know, becoming like a manager, I could do something useful. And I-I went to a company where I knew some people, and I knew they had really good people, and I knew that we could build a good product. But the challenge was all the operational organizational stuff in the way, and some some serious technical problems too. But so yeah, it's always A-A, relatively big investment, whereas Tesla was like a real Tesla and Apple were like weirdly successful companies. And I went there thinking, I don't know how they were, and my job wasn't to change them. I just wanted to learn from it. When I went to Apple, like, we went through three locked doors. And like like some people say, no, the most important thing is sharing and doing that, and apples full of silos, and the most important is carrying management leadership. And Steve Jobs is a famously difficult person, and is still really successful in people really love working there. And, you know, even though it got all the obvious things wrong, you know, like people were hard on each other. He went through locked door to your project. It was silo, and Steve famously yelled at people when they didn't do exactly what he wanted, while simultaneously expecting them to be creative and do something new. You know, tesla is very chaotic, but they're producing cars, you know, like, how does that stop work? And then it turns out there's lots of reasons why it works. And this is where when people are inspired despite or because of hard to say, you know, the situation, like, wild things happened. So I learned a tremendous And then I went to work at Intel, because I also, well, actually, I joined there partly because I had some ideas about how the bill really high on servers was, you know, into some really great technology. I spent most of my time working on methodology, is teen dynamics and some basics. I met a lot of people there. I had a lot of fun. Like, I had too much fun working. Yeah, it was an interesting set of challenges and, you know, stretch me out. But then my next thing I wanted to, you know, AI is, you know, boiling the ocean on computing in a really neat way, and working with an architecture I believe in in terms of like, this looks like the right map of what programmers are writing and what they say they're doing in the hardware. Well, it doesn't mean that that description gives you the right hardware software contract, but there's lots of technical work to do there. It's evolving. And then AI is attracted to really smart people. I meet really smart people all the time. I met a guy recently doing AI for for game engines. Soon. We talked for 4 h and it felt like 5 min It was really interesting. And so I like that kind of stimulating thing. And then, you know, part of me thinks, well, how's that going to impact how we do design, how I work with teams, how we work with people, what's going to happen in life. And, you know, it's okay. I feel lucky to be able to meet with people like that and talk to them. But, you know, I've done a lot of work to get there. You know, I work hard. I read a lot of books. I work on projects, sweated through difficult times with both technical problems and people problems. So that engineers do the work because they love it, not because it's easy or particularly, you know, short term rewarding. Engineering's not a good place for short term rewards, but it's relative to be satisfied. You know, the difference between happiness and satisfaction like this is a funny thing, because the people, there's already studies, they always publish like people who have kids are less happy than people who don't. And a hundred percent of paren'ts, well, not a hundred percent, but, you know, a hundred percent of good paren'ts say the best thing they ever did in their life was raise children. And like my dad told me that, I was like, really? You worked all the time. The happiness is what happens today, is satisfaction. As the or this successful project over time, engineering is much more of a satisfaction thing than a happiness. The humans have two reward systems, a slow one and a fast. 01:00 a.m. I hungry. I need food today. Yeah, I got it. I'm happy. Did I survive the air? Did my children survive childhood? You know, did those are satisfaction dimensions. Engineering is way more oriented to that, although it is fun to get to your model to compile or a test, or, you know, you solve a technical problem, or file a pad and, like, there's a bunch of short term happiness, but mostly it's a long term word.
So maybe, on that note, um, you can provide some words of wisdom to our listeners, interested in computer actor, interest in AI, interested in building their careers, perhaps in likeness after yours.
Well, like I said, work some So for people coming out of college or interns and stuff, try to find a place where people are doing real work, real hands on work, and they're relatively excited about it. Like, if you're, you know, when you're young, you should be working a lot hours, you know, you can't get, you can't get where you want it. 40 h a week, 5060. You know, some people say they bar Katy, but mostly they're a screw room for at that time. But something where you really feel like it's easy to work hard, it's easy to put in time do real hands on work and make sure you you have the least a few people that you really respect, that you know, they seem to know a lot. They teach you stuff. They take the time. And and then worked on a couple different things. I know people, I worked on, like one group for ten years, and I really loved the group. But, you know, working in multiple projects and different groups is really useful over time. It doesn't mean you'll leave in the middle of a project, but, you know, you periodically, you you find something new that's challenging, that takes you back. Like some people are really worried, like, I'm at this level here, but I go that project, I'll be here. And the answer that is great, do it go somewhere where you have to start over. You know, at Tesla. At one point I was, I was walking along shelves looking for visors, you know, sunvisors for a Model X like, it's a ridiculous job for me. But then I it made me think a lot about, well, how is the all the parts in that the factory organized, and then how the parts flow into the factory, and what does it look like, and, you know, and why is it built this way? And I learned a boatload about how cars go together. Who knew? And then that turned out to be really useful for thinking about how computers, and some of the computer skulls I had are actually useful for building cars. Yeah, it was really stimulating, made me think about things way differently than before, and surprising and, you know, kind of unusual. So, yeah, don't don't avoid those upper
jim keller. Thank you so much for joining us today. It's in a real pleasure talking to you. We've learned so much, and I'm sure our listeners will enjoy a lot too. Yep. It was a truly insightful conversation. And thank you so much for being on the podcast. And to our listeners. Uh, thank you for listening to the Computer Architecture portcast Till next time, it's good Bye from us. Singer.

【在 MegaStone 的大作中提到: 】
: https://comparchpodcast.podbean.com/e/episode-11-future-of-ai-computing-and-how-to-build-nurture-hardware-teams%c2%a0with-jim-keller-tenstorrent/
: 感觉硅仙人去tenstorrent后需要做PR了。除了之前板上之前转过的AnandTech的采访以外，发现他又去Computer Architecture Podcast来了一期。（前两年还去过Lex Fridman's Podcast）。
: 强烈推荐。硅仙人亲自讲模块化设计、编程模型设计，比二手的转写清晰多了。而且采访他的两位，Suvinay Subramanian和Lisa Hsu也都是体系结构科班出身，问得问题比Lex Fridman质量高很多。
: ...................
--
FROM 111.201.49.*
3楼|MegaStone|2023-02-26 05:15:38|只看此ID
开车听跑步听拖地听，总有一款适合吧
就是三个人聊天，看文本的效率反而不如听着高

【在 philbloo 的大作中提到: 】
: 懒得听有没有太长不读版
--
FROM 120.244.190.*
4楼|MegaStone|2023-02-26 05:21:23|只看此ID
哈哈，有些东西网易没识别出来就胡言乱语了，比如：
Kennell sent a digital said, there's no bad employees, there's only bad employee. Job matches.

前面sent a digital忘了是啥。。。第二句原文是there's no bad employees, there's only bad employer
可以拿这句话测试语音转文本的准确性。虽然Jim讲话又快又含糊，但如果有大语言模型语义理解加持的话，应该能分辨第二个是employer

【在 tianbing1212 的大作中提到: 】
: 确实太长了，用网易识别了文本
: Hi, and welcome to the Computer Architecture Podcast, a show that brings you closer to cutting edge work in computer architecture, the remarkable people behind it. We are your hosts. I'm souvin I Supermanian, and I'm Lisa Shoe.
: Today we have with us Jim Keller, who is the CTO of TensTorrent and a veteran computer architect. Prior to TensTorrent he has held rules of senior vice president at Intel, vice president of autopilot at Tesla, vice president and chief architect at AMD and at PA semi, which was acquired by Apple. Jim has led several successful silicon designs over the decades, from the DEC Alpha processors to AMD K7, K8 K12 hyper transport and AMD zen family, the Apple A4 and A5 processors, and tesla's self driving car chip. Today, he is here to talk to us about the future of AI computing, and how to build and nurture hardware teams.
: ...................
--
FROM 120.244.190.*
5楼|lambdaSeven|2023-11-17 21:15:03|只看此ID
嗨，欢迎来到计算机体系结构播客，这个节目让你更接近计算机体系结构的前沿工作，以及它背后的杰出人物。我们是您的东道主。我是 souvin I Supermanian，我是 Lisa Shoe。
今天，我们邀请到了 Jim Keller，他是 TensTorrent 的首席技术官，也是一位资深的计算机架构师。在加入 TensTorrent 之前，他曾担任英特尔高级副总裁、特斯拉自动驾驶副总裁、AMD 副总裁兼首席架构师以及被苹果收购的 PA semi 副总裁兼首席架构师。几十年来，Jim 领导了多项成功的芯片设计，从 DEC Alpha 处理器到 AMD K7、K8 K12 hyper transport 和 AMD zen 系列、Apple A4 和 A5 处理器，以及特斯拉的自动驾驶汽车芯片。今天，他在这里和我们谈谈人工智能计算的未来，以及如何建立和培养硬件团队。
快速免责声明，节目中分享的所有观点均为个人意见，并不反映他们工作的组织的观点。歌手吉姆（Jim）对预测表示欢迎。我们很高兴您今天能和我们在一起。
谢谢。很高兴来到这里。
是的，我们非常激动。长期听众会知道我们的第一个问题是，这些天是什么让你早上起床？
我想说，我以为你会说，晚上熬夜干什么？好吧，我晚上确实熬夜了。我刚在印度待了一个星期。我们在那里成立了一个设计团队，然后我遇到了计算机的IT部长，印度正在推动风险五高性能服务器和基于设计。所以我一直在和那些家伙谈论这件事，而且一大早就起床了。所以，这是一回事。我认为，呃，像人工智能和现代工具，以及其他一些东西，导致变化的速度比任何人真正想象的都要快，比如工具在改变，设计点在变化，代码在变化。比如，你如何构建和设计计算机和软件，这样你就可以更快地使用这些工具，你知道，从上到下。就像我们从定制设计到使用 CAD 工具进行设计，再到 SOC 设计，你有多个 IP 组件，然后把它们放在一起，现在设计复杂性不断上升，摩尔定律变成了更多的晶体管，但你仍然希望以良好的速度取得进展。然后你如何一起做这一切？所以，就像，然后这打开了你的应用程序，然后人工智能应用程序真的很疯狂，我一直在学习它。所以你认为，在某个时候，事情会变慢，但相反的事情正在发生，就像事情实际上发生得更快一样。虽然我倾向于在中间醒来，而不是思考事情。
实际上，我自己在凌晨两点或凌晨 03：00。所以你说的一件事我认为非常有趣，那就是关于工具的，我知道你在其他一些采访中谈到了这一点。嗯，因为看起来一切的变化都比你想象的要快，然后，为了用所有不断变化的技术构建新系统，你需要工具来改变它。但这有点像 A-A 难题，因为您需要在变化越来越快的技术之上更改工具。所以当我年轻的时候，我认为成为一名计算机架构师很棒，因为你所站立的土地总是在变化，wHICH 意味着没有什么是细分市场，你总是可以创新和做新的事情。但现在几乎就像地面在变化，嗯，有一天它是熔岩，有一天它就像，冰冷，什么？你必须换鞋，还没有人设计鞋子，所以你。所以，呵呵，有一天，哇，哇。
有一天，它是冰。它很大。是的，实际上，这是一个很好的比喻。我喜欢。哦，谢谢。
你如何适应和处理所有这些变化，比如，当你想要对变化进行推理并帮助你进行这些设计时，特别是当我们转向这些真正非常专业化和人工智能算法本身正在快速变化时。一切都在迅速变化。那么你如何应对呢？
是的，很好，第一个要求，这对拥有大量遗产的大公司来说很难，你需要像这样的新设计，就像我说过的每一个一样，每五年一次。你需要从头开始做正确的东西，时期。这是有很大原因的。因为，不管你的旧东西有多好，当你添加一些改进和补丁时，它就会慢慢纠缠在一起。我的一个朋友给我发了这篇论文，题目是《一个大泥球》。这是一个非常老派的网站，上面有一个大泥球的图片。它谈到，无论多么仔细地构建硬件或软件，你都有漂亮、干净的组件、定义良好的接口或 API，随着时间的推移，就像这个软件会学习这个部分，这将做一些事情，因为你知道，有人会在会议上交流，我会想出一个聪明的方法来让一些东西更快，很快，这一切都联系在一起了。所以你需要有新的东西。嗯，我们正在构建一个RISC-V处理器，你知道，相当高端的处理器。我们花了很多时间在前期设计它，以便计算机的每个子部分都是真正模块化的，并且真正清理了界面，所以，你知道，我告诉他的任务将是难以置信的。如果我们在 1995 年发现了 99% 的错误，那就太好了，而不是在 CPU 集成级别。我想 o CS 经历了这种转变。就像你从IP供应商那里购买高质量的IP，你制造了一个芯片，你真的不希望在IP中发现任何错误，但是如果你是一家拥有大量遗留IP的公司，并且一些IP是通过分解一个更复杂的东西而创建的，你从来没有清理过它，当你把碎片放在一起时，你可能会发现很大一部分错误，你必须修复它。所以新的设计给了你一个机会，说，我要重新设计它，让它在模块化的层面上变得干净。当我在Zen上工作时，他们的一些验证团队来找我，基本上告诉我，Jim，我们真的想用非常广泛的测试台测试所有单元。在老派的思维方式中，哦，是的，当然。重新创建整个设计两次。你知道，你有一个负载存储单元，现在你必须做一个东西来测试一个负载存储单元，它看起来很像计算机的其余部分，对吧？但他们是对的，因为实际上，测试加载故事单元的代码实际上比计算机的其余部分更简单。如果你把十个单元排成一排，你知道，通过缓存获取指令，解码重命名的调度能量。你加载存储。您从程序中操作负载存储单元的能力。这很艰难，因为你重新查看五层非常复杂的状态机，对吧？而如果你想让负载存储单元直接从负载存储引脚做所有事情，你可以这样做。所以有点像，我有点到达那里，但是，嗯，验证工程师提出，这将是他们在测试台级别编写更多代码的方法，以及我们如何测试模块化？然后一旦你想到这一点，你就会说，好吧，为什么这两个单元之间的接口有一百个随机命名的信号？就像你做过详细的计算机架构一样，你知道有一个叫做第四阶段的 A-A 信号，一个有效的获取，除非你知道发生了什么事，对吧？这不是一个可验证的信号，比如计算机里充满了这些东西。而且时间原因没有充分的理由，因为必须修复一个错误。哦，现在这个单位需要知道它处于其他地方的状态。计算机有定义良好的接口，你知道，指令PC，你知道，内存获取，填充，异常，终止停滞，等等。因此，有一个非常有趣的挑战，比如，如何构建具有明确定义接口的计算机？所以你是一个问题，每当某件事发生时，我都会发表这个评论，并且要谈谈，每当遇到困难的事情时，我都会这样做，因为一个地方有太多复杂、太多的东西，就像你必须把事情分解一样。有时问题不在于工具是否存在，而在于你试图以太多的方式解决你所知道的问题，比如你有一个 RTL 问题、一个时间问题、一个物理设计问题，它变得太多了，你必须真正弄清楚如何分解它并使其足够简单。而且，你知道，可验证的设计，验证接口。你知道，架构的思维方式是很重要的。哦，是的。我想了很多，然后你可以看到它，你知道，就像你正在评估的一些项目一样。这在一定程度上是因为你真的花时间在前面设计了这些作品，让它们独立地埋葬起来，干净利落。而且你有纪律，不会慢慢地把它变成球状泥巴，这是人类的自然倾向，不是吗？
是的，就是这样，这非常有趣。我想跟进一下这个医学问题，因为我想到了两件事。在回答的开头，你说过，对于拥有大量遗产的大公司来说，要做到这一点是多么困难，但我们确实有很多大公司已经活了很长时间。所以我有点想知道，你知道，当我还是一个年轻的引擎时，我就想，任何事情都是如何工作的？我不明白任何事情怎么可能起作用。其次，这整个事情，你知道，我已经看到了那种信号，比如，一个信号被称为这个东西的一个热选择器，在周期的前半部分，另一个热的信号，就像后半部分一样。你知道，它只是一团糟。然后我们确实让这些学生从学校出来，也许他们中的一些人一生中从未写过 RTL，他们从阅读中学习了所有的计算机体系结构，你知道，盒子和箭头之类的东西。那么如何组建团队呢？你比你有纪律来避免这个泥球，就像，好吧，不，我们要给这些东西命名，对吧？这个信号是有原因的，这个信号将有八位宽，我们将枚举前夕ry 这 8 位中的一个具有正确的名称和正确的状态。比如，你如何，你如何从你所处的位置推出它？
是的，所以有几件事。所以，你知道，一百年前的财富100强公司中，有百分之百的人都消失了，就像通用电气仍然存在一样，但形式完全不同。因此，公司确实会经历生命周期，随着时间的推移，几乎所有的生命周期都会消失，比如有些公司因为垄断原因或基础设施原因或其他原因而得到推动。因此，今天的成功并不能保证成功，尽管时间比你想象的要长。大多数公司不会在五年内倒闭。他们在 25 或 50 年失败。所以这是一回事，对吧？然后史蒂夫·乔布斯（Steve Jobs）有一句名言，所以你有一些新产品，然后你把它做得更好，然后你改进它，对吧？但是他们到了一个新的水平，你必须制造另一个新产品。问题是新产品不如精致的旧产品好，但你不能让旧产品变得更好。这是有史以来最好的旋转电话。按钮电话是 MS。感觉不太好。它看起来不太好。而且你必须有勇气从精致的高点跳到一个有头的低点，像这样的，这是他随机演讲的一句话。因此，我建议人们像史蒂夫·乔布斯（Steve Jobs）那样去做，因为史蒂夫·乔布斯（Steve Jobs）的基调很棒，就像，我看了很多，他非常清楚设计产品意味着什么，然后相信你要去哪里。所以你必须这样做，这对大公司的销售人员来说真的很难。然后你去，嘿，我们有这个很棒的新产品。它比我们今天的情况差 10%，但在接下来的五年里，它将是原来的两倍。他们都说，好吧，我们会等五年。但是，但是你没有在正确的事情上工作，所以你必须这样做。这是其一。
另一个是，我和一家著名的大公司进行了一次有趣的谈话，他们提供IP，他们有这个接口，里面有很多电线和很多随机的东西。我告诉他，界面太复杂了。我会去，是的，但真的，这是少数门。我说，现在你不明白了。电线很贵，门是免费的。右？所以你要做的一件事是，不要说，我得到了这个，你知道，精益求精的状态机，我导出所有的电线，当你拿走它，你把它放在一个盒子里，你把它变成一个接口，然后你权衡。你知道，当时，我以为我们把门换成电线，增加一堆门，因为它们很便宜。摩尔人法则，即大门多，电线少。但更好的说法是，我们用技术换取复杂性。就像你去看一个DDR内存控制器，例如在AXI总线上，所以典型的IP你可以从三个、四个或五个人那里购买。像鸡蛋一样的交易非常简单。大约有 15 个命令，你主要使用，读取和写入，在控件上，或者你说，我想在这个地址每天读取 32 个字节。这是一个非常简单的界面。在控制器内部，有一个内存显示您可能有 32 或 64 个事务。有一个待定的成熟报价。有一个小状态机知道 DDR 通道的当前状态。也许有两个 DDR 通道。每个通道有两个调光。DDR DRAM 是非常复杂的小部件，因为它们有一个读取、周期、写入周期、刷新时间。这RE 在 RE 交易中处于不同的状态。它真的很容易读取地址，正确的地址数据。你对它们的状态一无所知。现在，如果你构建一个高性能系统，你说，CPU，它将被优化。我们将向 DRAM 发送读取命令，我们知道我们将不得不对读取命令进行排序，因此我们将使用这个读取来击中它，尽早获得 DRAM，以便您可以导出它的复杂性。然后 CPU 确切地知道在第 167 周期，第 1 条数据将要出来。我们假设 CPU 将包装返回读取事务，所以你像我们一样被请求单词 1st，我们有各种各样的复杂性，但现在，事务非常简单。在内存控制器期间读取或写入。数据是随机出现的。它总是以相同的顺序出现。您不会将 CPU 的复杂性导出到内存控制现在，在某种程度上，这是有充分理由的。cpu 具有非常大的缓存和非常好的预取器。它们以 3、5 Giga HZ 的速度运行。内存控制器迟到了，看到了一百五十纳秒。就像包装该事务一样，可以节省 100 个事务中的 2 个 ns。这是，这是一个愚蠢的复杂性，对吧？虽然。你看看你的设计，好吧，我如何让界面的复杂性变得如此简单？右？然后有一个资助者，有一个有趣的资助者，那就是，嗯，人们总是想说，好吧，为什么我们没有缓存一致性的行业标准？好吧，缓存一致性是一种分布式状态。右？所以现在你说，所有你认识高通的人都会和你所知道的ARM有相同的规格，其他人。这是一件很难做到的事情，而像鸡蛋这样的规格，我知道，有一堆非常常用的规格非常简单，当你让它们足够简单时，那么很多人就可以使用它们。PCI Express 非常简单。以太网大多很简单。SMH的所以你可以看到这些通用标准的东西在它们上起作用。就像 Infiniband 的第一个版本一样，它试图优化延迟并进行了 1300 次回音。没有人能制造出无限的设备，所以他们进行了一些反省，并对其进行了不规则的简化。然后专注于在火灾上领先一代，拥有简单但足够好的我们的DNA，以及其他一些东西，然后，这成为人们可以成功使用的产品。如何避免复杂性？好吧，在顶部，你必须确定这是一个真正的目标，然后你要花一些钱。我打算在每个接口中多放十万个门，这样界面就简单了。还有一台有 200000 个门的计算机，这太疯狂了。在拥有一亿个门的计算机中，天才，对吧？就像，就像，有，有一个不同的计算。
是的，我认为有一系列引人入胜的见解。呃，从上到下，你谈到了，你知道，需要新的设计，我们需要从头开始改造，比如每五年一次。我想你如何做到这一点的一些要素？当然，其中一部分是模块化的、干净的接口，但也包括确保这些接口在堆栈的各个层都很简单，而不是太复杂。也许我可以双击你在 AI 世界中这样做的经验，因为那是非常需要这个。显然，因为计算需求有增无减。与此同时，似乎有一种愿望或意愿尝试新想法的实验。有人可能会争辩说，在堆栈的某些层，我们已经看到一些抽象的形成，例如，你知道，矩阵运算或卷积运算，更一般地说，一直是深度神经网络的面包和黄油。在这个即将到来的时代，你是否看到这种哲学和观点在堆栈上上下下欺骗？因为有操作员本身，但在软件方面，再一次，一旦你把他们放到硬件方面，就会有很多复杂性。正如你所说，你知道，你仍然在设计具有数百根电线的接口，用于在语义上只是为了读取和写入的东西。
是的，所以，是的。所以这是一个非常好的。我想说的是，在Ai中，我们还没有弄清楚硬件软件合同是什么。我给你举个例子。所以进入CPU世界。这并不完全正确，但这接近于正确。软件可以做一些非常复杂的事情，比如，如果你去看一个虚拟机，javascript引擎，比如，这太神奇了，对吧？事情真的很复杂。然后我成长了，就像，当我学会对我的程序集进行编程时，就像我曾经知道 8086 中 65 个中的 2 个的所有输出代码一样，其中大部分是 VAX 的。然后我学习了C编程。C 编程很棒，因为它是用汇编语言编写的。你知道，它是一种高级汇编语言。在某种程度上，作为一个架构师，你编写C代码，你可以看到将生成什么指令，并且主要看到它将如何执行。这很简单，但是现代计算机的实际合同是在寄存器上进行操作。你把数据放在寄存器里，然后对它们进行加、减和乘，你可以对它们进行分支。然后，从程序员的角度来看，有一种内存模型，你基本上按顺序加载东西，比如如果你加载地址 A，然后你再次加载它，你永远不会得到，比如，在你得到一个新值之后，你永远不会得到一个旧的值。所以看起来你已经订购了货物，然后你主要订购了商店，比如而且有弱的订购模型，但它们大多不起作用，因为你必须在修复它时设置障碍。因此，数据基本上存在于内存中。您将使用相对有序的加载加载它，对寄存器进行操作，然后使用有序存储存储数据，对吗？然后是页面模型、分页模型、权限模型、安全模型，但这些模型与执行模型是正交的，对吧？还有那个。因此，在这个简单的模型下，您可以构建一台按顺序排列的计算机。我们花了 20 年时间才弄清楚如何做到这一点。第一条规则是你不要违反受苦的人看到的执行模式。所以 vla W 失败了，因为他们试图违反模型。弱排序会失败，因为它违反了模型。就像任何人一样，就像那些为了获得性能而进行彻底重新编译的人一样，你知道，使用一个简单的引擎，失败的订单执行真的很疯狂，因为指令发出的顺序非常混乱，但它们并不违反该模型。为了实现这一点，我们构建了注册表命名，或者称为 kill 的东西。所以你执行了一堆指令，这些指令是有序的，其中一些发生了，然后你从终止点刷新所有年轻的指令，并按顺序完成旧的指令。右？我们有大量的分支预测器、数据预取器，但数据。但无论你做什么，你都不会违反执行合同。这意味着软件程序员不必是微架构师。右？一旦你要求他们成为我的建筑师，你就失败了。Itanium有八个障碍。没有人知道它们是干什么用的。就像我在《我是数字的》中一样，当我们构建 Alpha 时，我们有一个内存屏障，我们添加了一个，因为我们的内存排序很弱。所以我们违反了执行合同，我们破坏了所有的软件。所以我们有一个记忆障碍，但他们没有学会把它放进去。他们的操作系统有一个双 M-B 宏，因为他们不知道把它放在哪里。所以其中两个，有些地方似乎修复了一些随机的错误，比如，我不是在开玩笑。现在，我们在她的屏障中添加了一个权利，我们认为这会让事情变得更好，但它使事情变得更糟，因为他们只是把正确的记忆屏障放在记忆屏障麦克威尔中，因为我不知道该怎么处理它，对吧？所以这就像一个最坏的情况。
所以现在看看人工智能软件。因此，AI软件主要由程序员开发，程序员对执行模型的理解相当好。数据存在于内存中。你声明变量，这给了你一段内存。或者你做一些类似 malloc 和 free 的事情，你知道，这是内存模型之上的某种内存分配器。但一般来说，当你在一个程序中时，你不会把变量说成是地址。你有他们有需求，而且通常要做操作，你知道的。所以你说，A 将是时间 C，即 A-B 和 C 的负载，它们进入寄存器，你对它们进行操作并将它们发送回去。今天的 GPU 正在执行这个模型，就像你有很多非常快的 HPM Dram 数据，每个矩阵都在内存中，将内存中的数据相乘，你加载的寄存器和你通过操作执行的寄存器，你再次写出来。因此，您不断地写入和输出数据。这是有道理的。现在请注意，我们是否相信，当你编写那个程序时，你可以在人工智能的所有描述中非常清楚地看到它，该程序实际上定义了。它定义了一天的流程图。所以如果你去谷歌变形金刚或ResNet，你可能会，你可能会得到一张照片，对吧？图片将是一棵草，对吧？该图显示有一个操作框，数据进入其中，发生了一些事情，然后一些东西从中流出。然后，他们通常在本地数据中调用该操作权重的输入激活。而且他们做的操作数量实际上非常少。矩阵乘法卷积，某个版本，你知道，weloo g lu，你知道，soft max。然后是各种他们所谓的张量修改，你知道，当你缩小或打击矩阵时，你旋转它，你转置它，你将tuding转换为三维，有一堆张量修改，但空间中的算子数量很少。然后人们在一些事情上被风格化，比如，你知道，你如何准确地对 relu 进行编程，有一些实现方法。所以有趣的是，程序员们正在用Pietarch编写代码到一个看起来像标准编程的编程商城。他们用图表来描述他们正在做的事情，因为这是一种很好的思考方式。但是，你看，代码本身是一个混合体。因此，挑战在于，我们如何提出一个编程模型，以便我们所有人都相信并理解，可以快速前进，而不必一直做读取和写入所有数据、内存之类的事情，因为一些操作员希望数据成为内存，并且是唯一好的方法来做到这一点。然后我和人工智能程序员谈过，他们说，我很乐意把它改造成两倍的速度。就像，这是一种观点。
而你们中的另一个人是，我真的不在乎，因为所有的好处是双方都有更大的权重，更大的权重，更多的参数，而硬件会让它变得更快。在短期内，我只会购买更多的处理器。不过，我们有一个非常有趣的动态，感觉就像，你知道，当我们第一次开始构建边界处理器时，我想我在 95 岁时就开始研究它。这个孩子已经有一段时间了。IBM 360 超出了我们的执行范围。36091.我认为是，这太棒了。但是当我还是一个数字人的时候，有一个关于你是否真的可以制作出或计算机工作的争论，对吧？而且，你知道，有相互竞争的想法，你知道，有超级管道，超级规模，VLIW失序。然后是小窗户，大窗户。关于它有很多想法。显然，我认为一个人喜欢什么。但是你知道，仍然有一些人在争论，你知道，这是有大窗口和架构非常好的重新排序缓冲区的故障机器，重命名和当着我们的面杀人，就像这样有效，对吧？一个非常简单的程序员。所以有趣的是，就像一些人告诉我的那样，GPU只是工作，但在各种成千上万的人手工编码低级库。这是一篇非常好的学术论文。曾经说过，嘿，我决定写矩阵乘法，我写了它，而且可以很明显的方式，我得到了 nvidia 库性能的 5%。然后，他们做了明显的HPC转换，将其中一个矩阵存根块转置为denown寄存器文件大小，他们得到了30%。从 30% 到 60% ，他们从黑客开始，他们知道 restrifile 有多大，有多少个执行单元，有多少线程，你知道多少。你知道，个人图书馆有他们所谓的 couton 英寸，他们是伟大的程序员，知道如何让它发挥作用。现在 cuda 的魅力，你写了一个 cuda 程序，它会起作用的。不利的一面是，性能可能会因很多因素而随机偏离。但是当你在编写代码时，如果你说，好吧，为什么你的莱特矩阵会在 GPU 上成倍增加？有一个很大的图书馆，比如 So they're a library。所以他们有一个，他们有一个程序，所有这些都可以在图书馆中工作，主要解决你的需求。然后和现在你都是套利的，在最后的10%到20%，但那台电脑看起来一点也不像我程序员描述的程序，对吧？所以这是一件非常有趣的事情。因此，我们正在构建一个图形编译器，因此我们的处理器是处理器阵列中的，它们具有一些低级硬件支持来支持数据流，并且有一些有趣的方法，可以了解如何进行大型操作，并将它们分解为小型操作并进行协调。它的魅力在于它为您提供了更高效的处理和更少的数据移动，更少的内存读取权限。想到有趣的事情。喜欢。所以你说，如果我的内存足够大，可以容纳我需要的所有数据，那么它就是一只大内存。有T的功率。因此，如果你把公羊分成一个小点，更小的东西，它的访问效率会高得多。但是你可能不得不去其他公羊。所以有一个，在公羊点之间有一个权衡。一个主要的乘法有这种奇怪的现象，你知道，一个逐尾矩阵。它是 n 个立方体运算的平方数据移动，对吧？人工智能的工作范围是什么样的？像 AI 一样，随着操作的扩大，您可以获得更多的计算和数据移动。然后，有一些方法可以通过将大型操作分解为右侧来进一步优化它，因此它们足够大，可以进行良好的计算数据移动，但又足够小，可以在本地内存访问中高效。但就像你可以看到的那样，一整天，我开始，开始采取不同的方法。这并不是因为人们试图做一些不同的事情。这是因为存在一个真正的问题，那就是，你知道，程序是如何编写的，它们是如何描述的，它们看起来是什么，是非常不同的东西。而且，呃，就像，这在技术上很有趣。而且我认为解决方案会比，哦，我们将永远保持扩展更快的内存要好得多。诸如此类。这似乎不是正确的方法。
是的，我认为这是一组引人入胜的观点。我确实想进一步扩展一下我们在硬件软件中所知道的 AI 硬件软件合约或执行模型。通常，至少在当今最先进的模型中，其中一个属性是，它们需要很多技能，你拥有相互连接在一起的芯片，你可以将它们扩展到非常非常大的系统。我想听听你的看法
，实际上，确实有很小的系统，所以我认为你的指标是错误的。所以人脑似乎是聪明的，人们估计有 1e+18 到 1e+22 次操作，这取决于你是谁。右？所以 GPU 目前是 1e+15 次操作，每秒一次。所以它偏离了大约六个数量级。所以我们有一台这么大的电脑，对吧？你知道，这是一台普通的智能操作计算机，对吧？然后今天用GPU构建它需要，你知道，十万GPS或其他东西，这就像，问题出在GPU方面。所以我们说，这很大，但并没有那么大。好吧，大到什么？右？这是有趣的部分。就像，过去需要一台非常大的计算机来运行一个简单的 fortran 程序。你可以说，那是一台大电脑。但是现在那台电脑，如果它在半毫米的硅片上。所以七十年代的 fortran 计算机是，你知道的，.1 毫米？。所以摩尔定律解决了这个问题。所以大小是相对的。所以今天，是的，要建造一台大型训练机，你把一千个GPS放在一起，它感觉参数数量真的很大。这就像，你知道，300亿个参数。你知道，有一个 peta 字节，我们想要我们的叔叔的训练数据，它的数字是如此之大。这是一个有趣的数字。所以晶体管是一千乘一千乘一千个原子，100纳米。试想一下，就像一个七纳米的晶体管，叫做七纳米，但它大约是100×100×100和一百纳米，一千个晶体管。所以这是十亿个原子。我们用 out 来解析 0 中的 1。现在我们以大约一千兆赫兹的速度解析零中的一，这很酷。所以这是十亿。十亿个原子中每秒有一个零。所以这是一个十亿，大nuMBER，小数。我不知道。机器看起来很大，但是，他们认为电脑和iPhone会像，你知道，十台cray-1电脑，这在当时是很大的，现在我们把它们看作是20美元的零件，你知道，可以装进一个三瓦的信封里。所以这是一个相对的衡量标准。与传统计算相比，今天的人工智能程序规模很大。与实际的相比，他们很小，你知道，大多数普通的、聪明的人，
我听到了。我认为，我认为这是一个公平的观点。这个问题背后的意图更多的是说，嗯，你有一些东西，你在单个芯片上运行东西，比如，传统上，你知道，我们有这个，呃，当你构建一个芯片时，我们有一个明确的执行模型和合同。从某种意义上说，这些芯片是如何连接在一起的是一个单独的问题。就像分布式计算编程一样，如果你去数据库，它们有自己的一套协议，他们自己的一套执行模型来决定数据库事务的执行方式，等等。如果你去搭便车的 PC 世界，他们在合同中有一套不同的执行模型，比如 Fortel 或 Ai。特别是，你还看到我们可以有这种分离吗？您认为这不需要从芯片级边界到系统级边界的更统一的视图吗？因为，呃，你知道，你有各种形式的天堂主义。嗯，是的。
嗯，令人着迷的是，就像，就像电流，就像一千艘船，你知道，ML GPU计算机是第一个加速器模型，加速器中有一个主机CPU。那么在 GPU 本身中，有一个内存到内存算子模型，对吧？然后，该节点在多个节点之间运行某种网络堆栈，然后与 MPI 之类的东西进行协调。所以你有一个内存模型，一个加速模型，一个网络模型，一个MPI模型。所以为了让这一切工作，你知道，这是在你运行一个程序之前，这有点神奇，对吧？您可以回顾一下，当处理器具有 FPU 加速器时，FPU 具有驱动程序，对吗？所以你必须把歌剧送到FPU并进行投票。但是，当你们集成在一起后，浮点就变成了标准编程模型中的数据类型和指令。因此，加速器模型偶尔会消失。所以有一个，你知道，当浮点被集成时，仍然有矢量处理器，它们是矢量程序的加速器。他们死了，主要是因为浮点速度足够快，以至于在这里只有几台计算机运行浮点程序，而不是管理其加速器驱动程序模型。因此，我想说的是，当前的软件结构有些陈旧和复杂，但它建立在有根据的东西之上，例如 GPU 加速图形程序多年来一直很可靠。每个人都看了很多东西，然后走了，伙计，那里有很多记忆副本。而且，哦，编程模型，GPU，太简单了。但是，你知道，那是 20 岁的模型。网络是有效的，MPI已经使用了很长时间，而且它非常好，你知道，充实了。但事实上，要在 AI 程序中运行，在你编写 pytorch 程序之前，你需要四个编程模型，这是一种了不起的，甚至 pytorch 也没有真正理解更高层次的东西。因此，它们在节点上本地运行，即 MPI 协调器。是的，这相当复杂。N噢，如果你有一台运行人工智能的非常非常快的计算机，你知道这些层会消失。但是我们还没有那台电脑。这就是兴奋发生的地方，你知道，你知道，思考这些东西的正确方式是什么？这感觉就像我们正处于一个过渡时期。我们以前也经历过这些，比如从有序计算机到超级缩放器、超级矢量、你知道、无序、VLIW 这个词的变化，所有这些都花了 15 年的时间，我们可能已经进入了人工智能的第三年。是的。
所以，你觉得它在哪里，嗯，好像它会降落导致它？人工智能问题的一个棘手部分是，你知道，就像，你说，你知道，从程序员的角度来看，有一个数据，他们试图做什么的完整图表。你知道，这是这个张量。然后你希望你把这些东西发送到这里，然后我们有了这个硬件，我们想要一个构建来快速完成它。然后，你知道，视频解决方案是他们有这个中间，他们将高级数据照片转换为一些非常低级的库，这样他们就可以确保它在这个特定的硬件上是快速的。但似乎总是会出现这样的问题，你应该知道多大，我们不希望像你说的那样拥有一个巨大的 DRAM，就像它可以处理所有内存一样。这就像在一个巨大的块中。我们不一定想要一个单一的矩阵乘法器来处理你能想象到的最大矩阵乘法。你想把它分手。然后他们应该如何确定规模，然后他们应该如何相互沟通，然后，最终，他们将如何得到，你知道，如何，如何浓缩，所有这些都发送了一些实际正在执行的小过程，比如 relu 或类似的东西。就像，关于尺寸的问题总是出现，而尺寸通常实际上是由当前的技术水平决定的，这不会在六到八个月后成为国家艺术。所以你
问了一堆问题。因此，第一个AI就像，功能变化非常快。但是模型，有几个，你知道，显然有亚历克斯·内特，然后，你知道，Resnet，这是一个巨大的改进和提升。然后语言模型出现了变形金刚和意图，对吧？然后他们，他们，他们有，你知道，什么，痛苦的教训，就像大小总是胜过，你知道，聪明。所以有一些有趣的东西，有一定的稳定性。他们显然在学习调整。有很多调整正在进行中，比如，你如何将这一天标记到你如何将其映射到一个空间中？你如何管理你的训练？那里发生了一大堆事情。SMH的但在过去的几年里，这种情况一直有些稳定。变压器纸问世了，你知道，多少年前了？四年前，对吧？我们正在构建更大的模型，这些模型在此基础上进行了更精细的改进，但具有稳定性。所以有一个，每六个月有一个新的基准，他们正在走向一种叫做基准饱和的东西。他们说，你知道，比如，嘿，我们有这么大的图像集。你知道，人工智能识别它有多好。它从20%上升到50%，到80%，到90，到97。突然之间，这些基准是可饱和的，在百分之百，你就完成了。就像它没有说而很多CPU基准测试是，一秒能做多少次浮点运算？两倍总是两倍更好。所以其中一些东西，比如有一堆，比如，自然语言测试和数学测试，这些都是可交易的基准，因为你可以得到所有的答案。所以他们，所以他们一直处于这种搅动中，这些基准测试将在五年内保持良好状态。他们饱和了一个。所以这是一件有趣的事情。但是，让我们谈谈尺寸。因此，在高端，与我们的技术相比，我们的尺寸很大，但与需求相比很小，我会说这是一回事。然后让我们区分内存大小和内存容量。因此，如果内存容量很大，因为它存储了很多有用的信息，那将非常有趣。但是，如果它很大，因为它需要很多地方来存储中间操作，那就有点拖累了，对吧？因此，技术将如此，架构模型和技术将发展到不需要内存来存储中间操作的地步，就像现代服务器芯片一样。缓存足够大，内存访问应该主要是第一次需要数据，而不是像有一个很大的工作集，你一遍又一遍地读取和写入 drams 来做矩阵乘法，这将是一个拖累，所以在这种情况下，缓存应该变得更大，然后矩阵乘法应该结构化，以便您可以进行块。所以这种行为很好理解。如此大的记忆可以容纳一万亿个有用的这些信息。似乎是大内存的一个很好的用途。8 TB 带宽，因为您需要存储中间操作。似乎有点疯狂。因此，您可以做出一些差异化。然后进入观察，大脑看起来一点也不像你正在阅读和写作的大记忆。那么，你知道什么是珊瑚柱吗？你知道，它是一万或十万个神经元，它们被组织成一组层，并且在那里非常密集地连接在一起。然后它们以相对较低的带宽相互通信。所以在我看来，这看起来像是一系列处理器，具有本地存储、分布式计算和消息传递。这看起来有点像人们在编写 AI 时所说的他们正在构建的图形，这就是为什么在架构上，一个体现数据流的架构，知道如何做图形，知道如何传递中间结果，而不是一直存储它们，似乎很自然的事情就像大记忆一样清晰，可以容纳大量的东西。是的。与敌人需要大的记忆相比，我们目前的记忆很小。这似乎是一种架构上的反常现象。我们以前经历过这种情况。你知道，HPC机器曾经对内存禁令有很高的要求。没错，这曾经是内存禁令，内存禁令，你知道，运行流，运行流，对吧？因此，您得到了一百个处理器和一百兆字节的片上缓存。我们开始听到这方面的消息越来越少，因为越来越多的问题被考虑在密集计算和足够的芯片存储中，并且通常开始用于大型数据集的存储。现在，这并不总是正确的，而且有很多问题很难以这种方式考虑。关于非常大、稀疏的数据集和不可预测的数据集，有一些有趣的事情。尽管 HBC 的家伙在任何地方都有限制，但他们看，但好像它不像以前那样清晰。给我看 dram 带宽，我会告诉你计算机的性能，就像它更复杂一样。
所以这是否意味着，在某些方面，它几乎听起来像是，你知道，ML计算引擎内部结构的大小，这听起来像是，你觉得这有点稳定，而且是相对解决的。但是，我们有所有这些人工智能初创公司，他们试图在它们之上构建硬件和软件堆栈。你之前提到过，他们都有自己不同的做事方式。所以，你知道，如果结构本身现在基本上是稳定的，那么仍然存在，因为现在有一些原始人。
让我澄清一下。所以我的观点是，它的发展速度比人们想象的要慢。因此，基准层面的结果，以及一些调整和东西，正在迅速发展。目前的结构已经经历了两三代，这在某种程度上是稳定的。但明年可能会有一个改变一切的新结构。所以我不认为，它还没有达到一个平台稳定，就像，像失序的执行所，对吧？它仍然是，这是一个中断的平衡，对吧？好吧，我明白了，好吧，
你等待平衡。
比方说。所以，就像，当皮特·本和我一起研究特斯拉芯片时，我们曾经每隔一段时间醒来说，如果我们刚刚花了一年时间构建的引擎根本无法用于明天提出的算法怎么办？右？就像，这是一个真实的，让你保持清醒。那是凌晨 04：00 醒来。但事实证明，方法总是存在的。他们确实想出了。算法不能在那个引擎上工作，但他们找到了将一种算法转换为执行引擎的方法，并且他们已经取得了成功。他们确实获得了巨大的动力，并通过构建 A-A 真正专注的发动机而不是普通的发动机来节省成本。所以，你知道，这是一场净胜。因此，我们处于一种间歇性平衡状态，在一段时间内相对稳定，但你有一种感觉，事情需要改变。然后是软件的描述，其他人写代码来描述软件，以及那个执行引擎是什么。事实上，这些不同的事实真的是，你知道，好奇，而且，你知道，邀请，比方说，创新和思考。而且价格并不稳定，因为人们现在正在推动尺寸，因为大多数事情如果它们大十倍会更好，就像一些虚构的东西一样，但有些，有一些人工智能曲线仍在继续，比如你大了十倍，你仍然以实际的速度变得更好。这就是为什么我认为在未来五年内可能会有一些非常有趣的突破，关于信息是如何组织的，以及如何更好地工作，你知道，代表，本质上是会议和关系，这就是人工智能所做的，对吧？
是的。就在我们结束之前，这个特殊的团队关于未来突破的话题等等。回顾人工智能的进展，我们谈了几件事。一个是图表似乎是我们的数据。照片似乎是一个非常好的抽象，呃，在某种程度上表达了构建系统中的计算。最重要的是，你提到了一些关于我们应该支持的建筑解剖学Bably 修复，比如，你知道，这些在大型中间内存操作中等等。但展望未来，当他期待新的突破时，人工智能。有没有像你在 Tenstorrent 所做的那样，你认为，呃，我们应该把这种赌注看作是未来的趋势？
嗯，有几件事。一个是，你知道，有些人观察到这一点，但当它第一次出现在我身上时，就像，所以，你被教导，你知道，人工智能，你知道，推理和训练，所以推理，你有一个，你把一个输入到一个火车网络中，你会得到一个结果。然后训练我一些东西，比如，你有一些数据，有一个预期的结果，你把它放进去，然后你得到一个空气，你回传空气，对吧？当有人解释他们如何训练语言模型和一些图像模型时，你基本上是拿一个缎子，你在里面放一个空白，你可以把它跑一遍，猜出空白，我认为这真的很神奇，嗯。但要做到这一点，你做福特的计算，你保存了一切。然后，在反向传播中，使用优化方法来查看您应该计算的内容，并更新权重。因此，大脑显然不会保存他们在前进道路上所做的所有信息。然后是一些很酷的论文。有一个叫做 Rafnet，你知道，它就像一个可逆的 resnet，所以你不保存动画师的结果，你计算向后传递，这很酷。因此，我们似乎在训练方式上会取得突破。而且，当人类认为我们不是一直在训练时，就像 open Ai 的 Ilia 一样，我说，当你做一件非常快的事情时，它只会通过六层神经元。你不认为它是训练有素的。这是推论。你真正快速地所做的一切都是推理。然后，我们人类最常做的真正有趣的事情更像是生成性的东西。他们有一些输入集。你通过你的推理网络来生成东西，然后你看看它产生了什么，你的输入是什么，然后你决定再做一次。你不是在训练，而是在做这些循环推理循环，你知道，你大脑的那部分是你当前的理解阶段，你知道，你可以说是你的输入标记，但它装饰着，比如，你想做什么，你的目标是什么，你的历史是什么。每隔一段时间，你就会想一些事情，然后你去，这真的很好。很好。然后你会训练它。所以我们有人类和多种训练。我们有一些激动人心的事情发生。你还记得你的生活，从一个例子，对吧？所以我们有一种训练方法，就像做一个闪光灯一样，准确地记住发生了什么。然后我们有程序训练，你重复做一些事情，然后慢慢地训练自己以自动化的方式去做。然后我们有思考部分，就像生成式学习一样，你正在炖它，你正在尝试，因为你没有尝试，你找到了一种比你想到的任何其他模式都优越的模式。然后我们训练它，因为他用它作为下一个的基石。所以人类是生成性商场，人们有很多创新，他们称之为快速工程，有各种各样的东西，但它的结构，几乎就像它不够哲学，还没有思考人类的想法，我们有总体目标。我们有道德标准安达兹。我们有一些我们没有告诉我们要做的事情。我们有短期目标，长期目标。我们受到朋友和社会的约束。这也是我们的当下，当我们在做我们试图做的任何事情的日常任务时，这大多不是即时推理，而且大多不是训练。我认为这是一个非常有趣的现象。事实上，这些大型生成语言模型开始这样做，这真的非常令人好奇。然后想想，比如，你会如何制造一台计算机来做得更好？就像，这是A-A，非常有趣的现象。
是的，不是。说到人类智能目标和制造计算机，也许这是进入我们想和你谈谈的另一个团队的一个很好的方法，也就是说，呃，你知道，你曾在多家公司工作过。你已经建立了领导和培养成功的团队，交付了多个项目。为了让她从你那里得到一个观点，你如何看待建立团队，注意到他们，发展他们并扩大他们，特别是从构建硬件系统或处理器等的工厂？你学到的主要东西是什么？你如何看待这一点？
是的，问题。那么，你知道创造性张力这个词是什么意思吗？右？你脑子里有相反的想法，然后它们之间就会有紧张关系。你知道，我想出人头地，但今天下午我想傻傻地离开。这么大的关注，对吧？就像每个人都这样做一样，所以我部分认为是这样。我是一名计算机架构师。当我第一次开始管理一个大团队时，我去AMD的时候，我大概是2000年，而类似的事情，就像我在苹果公司工作一样，我有一个员工，我没有管理得特别好，然后我要管理500人。我成长在2000年左右，所以我意识到我可以把组织设计和人当作一个架构问题。在计算机上，架构师和人们，一般来说，有一些他们擅长的功能，然后是输入和输出。每个人都知道这是如何运作的。你写了一个带有函数输入输出的盒子，作为计算机架构师，你的任务之一就是以一种你得到你想要的结果的方式组织功能单元。因此，如果你试图构建一台计算机，你需要分解解决该问题的组织的架构。现代处理器设计，有架构组、架构组、RTL、物理设计和验证。然后人们，你知道，因为，你知道，可能是进化的原因，在温柔的、较小的胜利者的团队中运作得最好，经理，有十个人，一个真正优秀的十个人团队会胜过十个人，就像人类被设计来这样做一样，一个糟糕的团队会表现不佳。所有这些笑话都是关于，当你增加人员时，生产力会下降。但是，如果你的团队设计得很好，你的问题架构很好，那么团队成员就会喜欢工作。而且，你知道，团队，他们喜欢，你知道，五到十个人一起工作，他们很高兴见到彼此，最多50个人，就像，人们都非常了解彼此。你知道，在一百岁的时候，认识人变得很困难。你开始，你开始需要界限，因为人类倾向于把陌生人当作敌人。不管你说什么，你都可以对我们好。所以这就是，你知道，一个导演会管理一百个人，所以导演之间互相认识，但人和导演团队不需要o 彼此了解。所以有一个，你需要弄清楚的组织动态。然后进入人民方面。所以人们喜欢工程师喜欢他们的工作。就像，这是我的给定之一，因为工程太无聊了，很难每天做，你知道，如果你不是真的喜欢它，就会感到兴奋。就像，如果人们喜欢他们正在做的事情，他们愿意做艰苦、无聊的事情。那些不喜欢工程的人会离开它，因为它，它实际上是困难的、烦人的和重复的。有一堆关于它的东西，比如，我一遍又一遍地思考同一个问题。所以人们喜欢，你知道，所以工程师通常喜欢他们正在做的事情。他们必须这样做，否则他们做不到。然后，他们，你知道，他们喜欢做自己拥有的事情，但他们并不总是知道什么是正确的事情，对吧。因此，你需要有一些目标和步骤的层次结构，流程和方法，以及人们相互互动和激励的波浪，因为你试图在他们之间找到创造性的张力点。他们拥有它，他们正在做正确的事情，但他们仍然遵循某种计划或组织在一起，这很困难。这有意义吗？组织设计之间的创造性张力，你知道，需求，然后让我们说人类精神，就像兴奋的人比认为这个地方很糟糕的人多做十倍的工作，而且，你知道，我宁愿做其他任何事情。所以有一个巨大的波动。然后是合作良好的团队，个人无法
就该主题完全跟进问题的伟大东西。因为我觉得，是的，我敢肯定，嗯，因为我认为一件事，你知道，当我从一个年轻的工程师过渡到一个不那么年轻的工程师时，让我们称之为，你知道，第二部分，比如，构建团队，清楚地知道，比如，你拥有什么，以及你如何解决你的问题，让每个人都有，你知道，自主。但朝着同一个方向前进似乎是最困难的组织问题之一。所以我想我看到你说过这样的话，人们不喜欢做他们被告知要做的事情。他们喜欢做他们受到启发要做的事情。但我在你们的多个组织和多个团体中目睹的一件事是，让每个人都觉得，好吧，这就是我们正在做的事情。我们都同意这一点。你拥有这个，你拥有这个。我的意思是，这就像是最难的部分之一。所以这是一个问题。然后第二个问题是，你说了一些关于十人小组的事情。这些天你对远程工作有什么感觉？我知道它们是非常不同的问题，因为突然出现的东西。
是的。所以我不知道该如何看待远程工作，因为我不是粉丝。我喜欢和这样的人一起工作，他们说，我有非常成功的项目，不同地方的团队互相交谈，但我也看到人们在闲暇时整天远程工作。他们启动并运行了他们的 Zoom 聊天，他们正在交谈，几乎就像他们彼此相邻工作一样。所以关于这一点有很多东西需要弄清楚。嗯，你的第一个问题。因此，成为专家真的很有帮助。所以我成功地领导了一些项目，我是一名计算机体系结构专家。我不是我是无所不包的专家，但我写过CAD工具，我写过侦探计算机，我写过性能模型。我做过晶体管级设计。就像我有很多能力一样。然后我也相对无所畏惧。问了一些问题。因此，如果他们在一个房间里，人们解释一些事情，年轻人，请听这个。如果你不知道，问一个问题，人们不想告诉你答案，去别的地方工作，比如去弄清楚发生了什么。有一次有人对我抱怨，因为那是一位高级副总裁问了太多技术问题，因为他们习惯于带着废话的PowerPoint在房间里走来走去，胡说八道一个小时关于进度。第一页，我就想，这到底是怎么回事？解释一下，你知道，一句话对我来说没有任何意义。解释一下。没有人能解释它。所以你可以想象第二个词。这并没有让事情变得更好，对吧？有人说，当你跑向某物并远离某物时，你跑得最快。作为一名领导者，我非常高兴能够有远见，提出我想要的东西，并与人们一起实现这一目标。但我也非常乐意深入研究所有内容，并喜欢，这有意义吗？你能做到吗？你说，好吧，这不能扩展，但显然它不是。我在苹果公司工作，史蒂夫·乔布斯（Steve Jobs）让每个人都在他们的脚掌上工作，研究形状。因为我知道，如果史蒂夫发现你搞砸了，要付钱给埃隆，我——我看着他，你知道，他激励了很多人非常积极，亲力亲为，在技术上准备好回答关于他们在做什么的问题。没有废话和幻灯片。所以你需要有一个好的目标。你需要把它考虑在内。人们说，我明白了，我相信了，我能做到。您需要对管理结构有信心。我的团队在Zen上认为经理们非常称职。他们在技术上都很好。他们是优秀的管理者。因为，你知道，当你早上醒来时，人们会分成几分，你会想到技术问题还是人的问题？右？所以我是一个技术人员。我醒来时会思考技术问题，但后来我想解决需要很多人才能解决的问题。我把人变成了一个问题。所以我读了一堆关于心理学和人类学的书，以及组织结构和对卓越的追求，你知道，你能想到的。然后我想出了一个关于如何做到这一点的理论。我的一个理论是，我喜欢让技术上有能力但善于做事的经理为我工作。而且，你知道，这有助于冒犯周围的边缘，比如说，我，E.G.，或者问题，或者公司。就像当一个员工工作时，他们面前有技术问题，当他们有组织问题时，他们的老板可能是一个问题，这是一个拖累。比如，公司可能是个问题。竞争可能是一个问题。你知道，这可能很艰难，对吧？所以人们需要有人照顾他们，照顾他们。激励他们。你知道吗？同时，你必须做一些值得做的事情，并平衡它，所以，我的意思是，就像这是一个巨大的创造性张力空间。你知道，有些领导者真的很努力。我觉得他们太难了。我认为生活对很多人来说太难了。我会寻找解决组织和技术挑战的方法，就像大多数人一样。科Nnell发来的数字化说，没有坏员工，只有坏员工。工作匹配。当我年轻的时候，我认为这很愚蠢，大约在45岁左右，我认为这是一个非常好的想法。如果某人是个好人，那么他们几乎总能为一份工作做出贡献。现在，如果你处于经济衰退时期，你必须裁员。你按照某些顺序裁员，就像人们知道的那样。但是，你知道，为人们解决问题很重要，因为我看到它变成了一个非常积极的结果和组织。但它，它是，有多个维度。就像有人说的，好吧，有什么办法呢？嗯，嗯，你知道，你的目标明确吗？好吧，很多人在那里失败了，目标很明确。就像有一种叫做Bulls的组织基础设施，组织能力和信任。你必须解决所有四个问题。目标。清楚。他们可行。是的，组织是否服务于目标？比如，如果你，你知道，处理器在六个单元中坏了，你有六个好的线索吗？每个单位内部都有人的能力？你有技术能力去做吗，或者你能发现问题吗，真的，现在可能能够解决这些问题的人？功能是很大的。然后，信任，无序的，复杂的句子，因为它通常是前三个的输出，而不是一些经理说的输入，我们将专注于信任和执行。这些是输出。在输入函数、输出的世界里，你不能通过查看输出来改变输出。您可以更改输入或函数。输出就是输出。当您将其中一个更改为时，输出会发生变化。所以任何经理都会说，什么是执行，执行，执行，除非他们正在做些什么？他们做了什么吗？还是他们在培训人？他们是否在招聘人才？他们是否正确地审查了人们？他们会购买新的 CAD 工具吗？比如，他们做了什么来提高执行力？如果他们所做的只是说执行这个词，那么他们就是胡说八道。所以你必须解决多个维度，而你不能解决其中一个。在它下面有一堆地方，那里有多个维度。然后，当你开始真正看到伟大的领导者和完成项目之间的区别时。我和一些非常伟大的领导者一起工作，所以我很惊讶，就像第三种模型建造得如此之快，有这么多人跨越这么多维度。而且，你知道，埃隆非常鼓舞人心，非常善于细节，但达格菲尔德建立并阻止了一个非常广泛的组织，我观察并做到了。你知道，当我们在18个月内制造出自动驾驶仪并驾驶汽车时，我就在那里。但与建造Model Three并交付相比，这是相对较小的土豆。所以这是一个，这是一个，看着这些东西真的很有趣，然后你必须认真对待它们。然而，我意识到，无论如何，你都不知道那么多。然后，你知道，然后你必须深入研究它。然后，如果你幸运的话，你会找到合适的人，你会得到合适的地方。但是，是的，这是一个难题。
工程师可能应该多读书。人们总是问我，我应该读哪三本书？我想，好吧，我会写一千个。所以我最喜欢的三本书可能只是因为我知道的其他东西而喜欢它们。所以我很难推荐t他读书是为了赢得办公室，但阅读大量书籍会有所帮助。
是的。这是我工作过的一些高层领导在谈论他们读过的所有书时总是让我感到困惑的一件事。因为在某些方面，你知道，当你在制作时，当你是一名初级工程师时，你就像，好吧，我只是去，我要去做我的工作，对吧？我要写我的代码，我要做我的模块，我要运行我的单元测试，或者其他什么。然后当你进行过渡时，它变得更像是，好吧，完成这些事情不仅仅是技术上的事情，然后过渡到将你的脑力花在另一部分，然后确保你适当地使用它。这是一个有点艰难的过渡，因为你的盒子和箭头里有很多舒适感，就在它应该这样的时候。但后来又想，好吧，你怎么让每个人都相信它应该是这样的？我们如何让任何人来到这里......
你在 35 到 45 岁之间，你意识到几乎所有的问题都不是技术性的。
是的，你很清楚。
然后，不幸的是，这有点像训练语言模型时。你不说，0.10美元是什么？我需要训练这个模型吗？没有了。实际上有帮助，对吧？而且，很多时候，一本真正伟大的书只是一本伟大的书，因为你读了其他一百本书，因为它是把你的想法聚集在一起的那本书。如果你在第 1 次阅读，那将没有任何意义。所以，所以数量上的账户，我不怕。比如，我经常读一本书，我真的，但大多数人写的书都有 25 或 50 页的东西要告诉你，但编辑告诉你写到 200 页，因为这就是卖点。所以不要害怕读 50 页然后去，我明白了。他似乎在重复自己，几乎所有的作家，一旦他们开始重复自己，他们就不会在以后的书页下埋一些非常好的金块，对吧？一旦他们开始重复自己，他们就会名誉扫地。因为啄人热情高涨。写一本关于工程、管理、想法、灵感项目的书。他们倾注心血，直到完成，然后他们填写它，直到他们得到 200 页。所以不要害怕在读完 50 页后扔掉一本书。现在，我读了这本书，呃，反对方法。这是我目前最喜欢的保罗·菲尔·阿博特（Paul fire Abbot）的书。这本的书有300页长，他只有一个想法。下午，我一直在等他开始重复自己，这样我就可以放下它了，因为它太密集了。但在这里，他并没有放弃。他只是一直写着这本该死的书。这很有趣。但是，是的，这是真实存在的。因此，如果你开始意识到，除了工程技术问题之外，还有更多工作要做，那么你就达到了一个新的水平，这很好，你应该看到它，因为它会让你更快乐，如果你这样做，你会更成功。你可能会得出结论，这真的很酷。现在我更善于管理团队，我可以专注于技术方面的工作。现在我们想管理更大的团队。或者现在我们想进入销售领域，或者现在我明白了。我真的应该花更多的时间冲浪。好像一切都很好。但是当你达到这一点时，这是一件好事。这是一件艰难的事情。这比大学更难。比大学更难，是的，大学，大部分在书中都有答案。然后，你知道，当你开始工作时，你会意识到他们，你知道，如果答案在书中，你不会得到那么多报酬。但后来我们开始尝试解决这个下一级的问题。根本没有答案，但有一些解决方案。
所以也许现在是把时钟束缚回去的好时机，你可以告诉我们的听众，比如，你是如何进入计算机架构的？您是如何实现员工与工作匹配适合您的您是如何对该领域产生兴趣的？你知道，你最终是怎么变成乞丐的？
我会更随意地控制，因为在大学里，就像我在高中时基本上是傻瓜一样，就像我看到今天的孩子在学习SAT考试一样。我还记得，你知道，我和我的伙伴们出去玩，想，我明天应该参加SAT考试。我可能不应该整夜呆在外面，嗯，但我上了大学，我在高中时表现得很好，我想数学和物理以及一些主题。我和我去了宾夕法尼亚州立大学，我是电气工程哲学专业的组合。而且，但事实证明我根本不会写。在我大二的时候，哲学系主任给我发了一张纸条，比如，我想通过电气工程系，他说，我真的很想见见你。就像，是的，很高兴见到你。这真的很疯狂，你知道，我刚刚上了四节哲学课。他说，是的，我们注意到你是哲学专业的。然后他拿出，就像一个典型的哲学系学生写的一篇论文，大概是期中考试，十页，你知道，写得很好，你知道，完美的句子等等。然后他有了我的页面，就像半个段落，上面有划痕和市场上的文字。他说，他说，吉姆，你永远不会像你一样获得宾夕法尼亚州立大学的哲学学位，他说，我们很开心，我们喜欢你在课堂上，但我们写了很多，你根本写不出来。我当时想，天哪，真的吗？你把我踢出了哲学。我什至不知道那是一回事。所以，但是，但宾夕法尼亚州立大学很棒。我们等了两英寸。我做了威化饼。所以在大学里，我以为我是半导体专业。我的导师负责这个实验室，我学到了很多东西。我随便找了一份工作来建造光纤网络控制器，因为我想住在佛罗里达的海滩附近。当我在那里时，这是一次很棒的经历，这是一份糟糕的工作。有人说我应该使用数字设备。他们给了我11本77 80纸质手册，我在去面试的飞机上读了这些手册。我觉得这真的很酷。但是我进去了，我有很多问题。所以我在1170遇到了首席架构师，在1170年，在1170年，鲍勃·斯图尔特（Bob Stewart），他是一位伟大的工程师，我有所有这些问题，他觉得我很有趣，他雇用我作为百灵鸟，我想，因为他知道我对计算机一无所知。我真的告诉他，我只是在飞机上读了这本书。我一生中上过一堂 Fortran 课，但并不顺利。我花了 15 年的时间从事数字化工作，在那里我学会了成为一名计算机架构师，最初主要是为 Bob Stewart 工作。但是还有其他一些人，挖克拉克，戴夫萨格，好像那里有几个传奇人物。他们真的很好。我有机会与他们一起工作。我小时候精力充沛，所以我就跳进去了，我学到了很多东西。我慢慢地学习了计算机体系结构，就像在体育课上一样皮特乐队和我一起工作，从1983年开始，或者类似的东西。因此，我们一起研究了 80 800 个后续项目。第一艘，第二艘阿尔法飞船，给了我五艘，并为此编写了性能模型，你知道的。那时候，你有时会读报纸。然后他动手做工作，你知道，这真的很好。而我，我有机会进入很多不同的事情。我写了一个逻辑模拟器，一个验证器中的时间，几个性能模型。我画了原理图。我写了一点 RTL。奇怪的是，在我的生活中没有那么多的artil，而是一些原因，那就是，这是现在的主要方法。但是当我这样做时，就像我曾经有新的卡诺地图和旧的科林拿了学校球的东西一样，没有人再这样做了。是的，数字设备。我在那里工作了大约 15 年，开发了一些非常成功和一些不成功的产品，你知道，我们的第 2 个，或者是 Backstadian Hundred 的后续产品，由于部分设计原因和部分政治原因被取消了，意识到这一点非常痛苦。然后，当我离开时，数字本身就倒闭了，它被卖给了Compact。所以，你知道，比如，你把五个前 deckey 和一杯啤酒放在一起，而我会在大约 30 分钟内开始哭泣，因为这是一个很棒的工作场所。而且，你知道，一个令人惊讶的，你知道的灾难，假设我们正在制造世界上最快的计算机，同时倒闭。因此，产品、市场管理等商业计划的结合，比如说，像数字化这样的商业计划并没有改变商业计划。营销人员已经抓住了这一点，以至于他们认为随着 pc 和板站的出现，提高 VAX 的价格是风向。
因此，这是我有时会告诉年轻工程师或实习生的一件事，你知道，任何业务的很多成功都不必归结为技术细节。事实上，通常不会。因此，与此同时，正如你所说，对于即将到来的人来说，了解很多这些具体细节非常重要。右？所以听起来，在第一个 15 年里，你收集了很多关于整个过程的经验和知识，端到端，如何构建一台好的计算机。所以我们想要这样。但是，与此同时，你知道，这种理解，你知道，你可能，你知道，beta 可能比，嗯，beta Max 可能更好，但他们没有赢得，你知道，套牌。我想我认识的每一个前甲板人，就像你说的，就像，那是美妙的经历。但是，真的很难过它的结局，你是如何，也许，听起来也许你已经，在这一点上，你的职业生涯，能够建立起一座桥梁，在那里你可以拥有良好的工程服务，良好的建设，建立良好的团队，然后以某种方式将其转化为成功
，是的，它一定有点复杂。所以第一份，就像我的第一份工作一样，我和一些大学毕业的非常聪明的人一起工作，但每个人都讨厌这家公司。当我转向数字化时，每个人都喜欢数字化。就像我的一个朋友的伴侣说的，他们在水里放了什么？你们所有人都会减少，就像我们在工作上一样，然后我们出去喝酒，谈论工作，然后去工作。在你年轻的时候，星期六，有两件非常重要的事情。一个是，你是否在为一个你可以学习的人工作？并确保你学习，提出问题，尝试项目，获得良好的反馈。还有工作我在一个有趣的环境中。现在，你可能在一个失败的公司里加入一个非常好的团队，并获得很好的经验。但一般来说，如果公司在增长型市场中度过一年，那么公司会更快乐，而在萎缩的市场中，公司会更不快乐。你可以挑选。但现在，有一些非常大的、成长中的公司已经失去了工程师的阴谋。和工程师迷路了。你知道，他们雇佣了一百个实习生，他们把他们放在随机的东西上，他们什么也没学到。所以你必须小心这一点。但你知道，一个积极的环境，有一个你可以真正学习的人，这真的很重要。但是，当你得到更大的东西时，就像，你如何构建一个成功的产品？就像我真的去了，知道他们是一家失败的公司，对吧？我以为乐趣的一部分是，现在，我们能扭转局面吗？我为罗伊·里德（Roy Reid）工作，你知道，他非常清楚，就像他说他直到到达那里才意识到他们破产了。他看了看书，然后说，我会拯救公司。你们构建了产品。Raj Gadore和我是转身的建筑师。他正在使用图形。但是我们有一些非常优秀的人，我们也有组织问题，还有一些，你知道，相对较少的糟糕的经理和糟糕的人，你知道，工作适合。而且，你知道，我们做了，我们在某个地方做了一堆调整，非常明显，有些更微妙。但是我真的很努力，比如，你如何让negol的组织能力都排成一列？我相信，你知道，信任会从中产生。当时我有一位非常棒的顾问，Cat Row，他给了我源源不断的书籍、文章和东西，让我阅读，并与我一起集思广益，讨论如何成功。所以我投资了，你知道，成为一个经理，我可以做一些有用的事情。我去了一家公司，在那里我认识一些人，我知道他们有非常优秀的人，我知道我们可以制造出一个好的产品。但挑战在于所有运营组织方面的问题，以及一些严重的技术问题。但是，是的，它总是A-A，相对较大的投资，而特斯拉就像真正的特斯拉，而苹果就像奇怪的成功公司。我去那里想，我不知道它们是怎么回事，我的工作不是改变它们。我只是想从中学习。当我去苹果公司时，我们穿过三扇锁着的门。就像有些人说的，不，最重要的是分享和做，苹果到处都是孤岛，最重要的是承载管理领导力。史蒂夫·乔布斯（Steve Jobs）是一个出了名的难相处的人，在人们真正喜欢在那里工作的人中仍然非常成功。而且，你知道，即使它把所有明显的事情都弄错了，你知道，就像人们彼此之间很苛刻一样。他穿过锁着的门进入了你的项目。这是孤岛，当人们没有完全按照他的意愿行事时，史蒂夫会对他们大喊大叫，同时期望他们有创造力并做一些新的事情。你知道，特斯拉非常混乱，但他们正在生产汽车，你知道，比如，那停止是如何工作的？然后事实证明，它起作用的原因有很多。这就是当人们受到启发的地方，尽管或因为很难说，你知道，情况，比如，疯狂的事情发生了。所以我学到了很多东西，然后我去了英特尔工作，因为我也，嗯，actu盟友，我加入那里的部分原因是我对服务器的真正高额费用有一些想法，你知道，一些非常伟大的技术。我把大部分时间都花在方法论上，是青少年动力学和一些基础知识。我在那里遇到了很多人。我玩得很开心。就像，我工作太开心了。是的，这是一系列有趣的挑战，你知道，让我筋疲力尽。但是接下来我想做的是，你知道，人工智能是，你知道，以一种非常简洁的方式在计算上沸腾海洋，并使用我相信的架构，这看起来像是程序员正在编写的内容以及他们所说的他们在硬件中做什么的正确地图。好吧，这并不意味着该描述为您提供了正确的硬件软件合同，但那里有很多技术工作要做。它正在不断发展。然后人工智能被真正聪明的人所吸引。我经常遇到非常聪明的人。我最近遇到了一个为游戏引擎做人工智能的人。很快。我们聊了4个小时，感觉像5分钟，这真的很有趣。所以我喜欢这种刺激的东西。然后，你知道，我的一部分在想，这将如何影响我们如何做设计，我如何与团队合作，我们如何与人合作，生活中会发生什么。而且，你知道，没关系。我很幸运能够见到这样的人并与他们交谈。但是，你知道，我做了很多工作来达到这个目标。你知道，我努力工作。我读了很多书。我从事项目工作，在困难时期因技术问题和人员问题而汗流浃背。因此，工程师们做这项工作是因为他们喜欢它，而不是因为它很容易，或者特别，你知道，短期的回报。工程学不是短期回报的好地方，但它是相对满足的。你知道，像这样的幸福和满足之间的区别是一件有趣的事情，因为人们，已经有研究，他们总是发表文章，好像有孩子的人比没有孩子的人更不快乐。百分之百的 paren't，嗯，不是百分之百，但是，你知道，百分之百的好 parent 说他们一生中做过的最好的事情就是抚养孩子。就像我爸爸告诉我的，我当时想，真的吗？你一直在工作。幸福是今天发生的事情，是满足。随着时间的流逝，作为这个成功的项目，工程更像是一种满足感，而不是一种幸福。人类有两种奖励系统，一种是慢速奖励系统，一种是快速奖励系统。凌晨 01：00 我饿了。我今天需要食物。是的，我明白了。我很高兴。我在空中幸存下来了吗？我的孩子在童年时期幸存下来了吗？你知道，这些是满意度维度吗？工程学更倾向于这一点，尽管进入你的模型进行编译或测试很有趣，或者，你知道，你解决了一个技术问题，或者提交了一个垫子，比如，有一堆短期的快乐，但大多数情况下，这是一个长期的词。
所以也许，在这一点上，嗯，你可以为我们的听众提供一些智慧的话，对计算机演员感兴趣，对人工智能感兴趣，有兴趣建立他们的职业生涯，也许像你一样。
好吧，就像我说的，工作一些所以对于大学毕业或实习生之类的人来说，试着找到一个人们做真正工作的地方，真正的动手工作，他们对此相对兴奋。就像，如果你是，你知道，当你年轻的时候，你应该工作很多小时，你知道，你不能得到，你不能到达你想要的地方。每周 40 小时，5060。你知道，一些p哎呀，他们说他们禁止凯蒂，但大多数情况下，他们当时都是一个螺丝钉房。但是，你真的觉得努力工作很容易，很容易花时间做真正的动手工作，并确保你拥有最少几个你真正尊重的人，你知道，他们似乎知道很多。他们教你东西。他们需要时间。然后做了几件不同的事情。我认识一些人，我一起工作了十年，我真的很喜欢这个团队。但是，你知道，随着时间的推移，在多个项目和不同的小组中工作真的很有用。这并不意味着你会在一个项目中途离开，但是，你知道，你定期地，你发现一些具有挑战性的新东西，让你回来。就像有些人真的很担心，比如，我在这里处于这个水平，但我去那个项目，我会在这里。答案很好，去做一个你必须重新开始的地方。你知道，在特斯拉。有一次，我在货架上走来走去寻找遮阳板，你知道，像Model X这样的遮阳板，这对我来说是一项荒谬的工作。但后来我让我思考了很多，嗯，工厂里的所有零件是如何组织的，然后零件是如何流入工厂的，它是什么样子的，你知道，为什么它是这样建造的？我学到了很多关于汽车如何搭配的知识。谁知道呢？然后事实证明，这对于思考计算机非常有用，我所拥有的一些计算机头骨实际上对制造汽车很有用。是的，这真的很刺激，让我对事情的看法与以前不同，令人惊讶，你知道，有点不寻常。所以，是的，不要回避那些上层
的吉姆凯勒。非常感谢您今天参加我们的会议。和你聊天真是太高兴了。我们学到了很多东西，我相信我们的听众也会喜欢很多东西。是的。这是一次真正有见地的对话。非常感谢您参加播客。还有我们的听众。呃，谢谢你收听计算机体系结构端口广播直到下次，再见我们。歌手。
--
FROM 124.64.17.*
6楼|hgoldfish|2023-11-18 11:14:21|只看此ID
这排版有点难受。。不能加个分行啊。

【在 lambdaSeven 的大作中提到: 】
: 嗨，欢迎来到计算机体系结构播客，这个节目让你更接近计算机体系结构的前沿工作，以及它背后的杰出人物。我们是您的东道主。我是 souvin I Supermanian，我是 Lisa Shoe。
: 今天，我们邀请到了 Jim Keller，他是 TensTorrent 的首席技术官，也是一位资深的计算机架构师。在加入 TensTorrent 之前，他曾担任英特尔高级副总裁、特斯拉自动驾驶副总裁、AMD 副总裁兼首席架构师以及被苹果收购的 PA semi 副总裁兼首席架构师。几十年来，Jim 领导了多项
: 晒Φ男酒杓疲 DEC Alpha 处理器到 AMD K7、K8 K12 hyper transport 和 AMD zen 系列、Apple A4 和 A5 处理器，以及特斯拉的自动驾驶汽车芯片。今天，他在这里和我们谈谈人工智能计算的未来，以及如何建立和培养硬件团队。
: ...................
--
FROM 183.253.147.*