Wednesday, December 05, 2007

It's a bit late (actually you may note I tend to blog late at night anyway), but I've been wanting to take on a topic that's of interest to me from both an NVIDIA point of view and my Microprocessor Report analyst background. It's the chances that Intel's Larrabee will be a successful GPU.

I would like to offer the opinion that the deck is stacked against it, despite Intel's hype machine pushing it. For one, the design looks more like the IBM Cell processor or the Sun Niagara (UltraSPARC T2) than a graphics chips. The Cell was proved to be ineffectual as a GPU (which explains why we got the graphics chip in the PS3) and Niagara is focused on Web services (lots of threads to service).

Let me also ask this question: when has Intel EVER produced a high performance graphic chip? (Answer: never)

Oh, and when did we all decide that x86 was the most perfect instruction set for graphics? Hmmmm, it's not. The example where the x86 is the preferred instruction set for either graphics or high performance computing doesn't exist. The x86 instruction is an burden on the chip, not an advantage, for graphics. In fact, graphics instruction sets are hidden by the OpenGL and DirectX API, so the only people who care are the driver programmers and we are free to design whatever instruction set is most efficient for the task, while Intel has handcuffed themselves with a fixed instruction set. Not the smartest move for graphics. Intel's plans may work better for HPC though. Still, there are many alternatives to x86 that are actually better (shocking isn't it)!

Larrabee is based on a simplified x86 core with an extended SSE-like SIMD processing unit. In order to be efficient, the extended-SSE unit needs to be packed efficiently. How is Intel going to do that? The magic is all left to the software compiler, which is going have a tough time finding that much parallelism efficiently.

Larrabee will have impressive theoretical specs that will wow the press, but it's going to be very hard to use the chip up to its potential, which sounds exactly like Cell. And Cell has not lived up to its hype. So my corollary is Larrabee = Cell; Cell not = to hype; Larrabee will not = hype.

Now we wait until sometime in mid-2008 to see some silicon. But you heard it hear first: Larrabee will not reach the potential performance capability (by a long shot) and will not displace NVIDIA as GPU leader.

3 comments:

Architecture Professor said...

I think NVIDIA has more to worry about with Larrabee than you might expect. I think Larrabee is a real threat to NVIDIA for a few reasons.

First, if you step back and bit, Larrabee just doesn't actually look that different than the NVIDIA G80 architecture. For example, the G80 has 8 "threads" that share a common memory that are used in four-cycles warps to give a 32-wide SIMD-like operation. There are then 16 or so of these units on a high-end G80 part. Compare that with Larrabee which has 16-wide (64-byte) vector units per core and then has 32 cores. It really doesn't look that different (except Larrabee's units are fully-pipelined, giving it more overall flops).

Second, Larrabee is globally cache coherent, making it much easier to program from than the G80 or IBM's Cell processor. No DMA in/out of scratchpad memory. Just coherent load/stores.

Third, there is something in your post that is internally inconsistent. You say first that "graphics instruction sets are hidden by the OpenGL and DirectX API" but then you say that with Larrabee
"The magic is all left to the software compiler, which is going have a tough time finding that much parallelism efficiently". Larrabee will also use DirectX and OpenGL, using hand-optimized and tuned software written specifically for Larrabee. How is that any different than the G80? Furthermore, Intel has invented Ct, which will also easy programming for Larrabee.

Finally, you say "The x86 instruction is an burden on the chip, not an advantage". Yea, we all know how well *that* argument worked out for all the RISC chip makers. Best of luck on that one. If Intel has shown anything, it is that the x86 penalty can be overcome with smart engineering (plus a really great fabrication process).

One last parting thought: Larrabee might not hurt NVIDIA that much. After all, it is just another discrete GPU part. The real challenge to NVIDIA will be whatever follows Larrabee. What would a chip with two "Core 2" cores and eight or sixteen Larrabee cores do to NVIDIA's sales? Intel's integrated graphics has already hurt NVIDIA in the low-end PC GPU space, and now Intel is attacking the high-end with Larrabee. But is when Intel attacks the mid-range with an integrated solution, that could really hurt NVIDIA.

Hopefully NVIDIA can use the year or two between now and when Larrabee chips to leapfrog past Larrabee with whatever NVIDIA is planning after the G80.

P.S. if NVIDIA was to, say, purchase Ageia, to counter Intel's purchase of Havok, that might be pretty helpful going forward, too.

kevin said...

Good comments.

Overall I think our warps have greater flexibility than a very wide SIMD unit. My challenge to Intel is packing the SIMD word (rumored to 16 SP words wide) to make them efficient. NVIDIA has a many years of experience that Intel will have to catch up to. My best indicator right now is that Intel has been unable to get DX10 working on the programmable shaders in the Graphics Media Accelerator 3500 (Intel G35).

The global coherency can be an advantage over time, and most especially in HPC. For graphics, Microsoft doesn't presently support a coherent model for graphics.

The x86 instruction set is irrelevent to graphics because the application program (game) intefaces to graphics though an API. Only the driver needs to know the GPU ISA.

x86 has survived in environments where the broad availability of relevent code made it a common denominator. In graphics, I don't see a need for a common ISA. I also believe that attempt to fit x86 into cell phones will fail except in some high-end smartphone/MID devices.

2009 will be an interesting year.

Architecture Professor said...

I don't know that much about the G80 warps and what flexibility they allow or don't allow. However, from that I've heard about Larrabee, the 64-byte (16 SP, as you say) SIMD units are more flexible than typical x86 SSE2. For example, it sounds like it will have an implicit vector mask input, which will make it much more flexible. From what I understand, some of Larrabee's vector operations are specifically tailored to graphics processing. I think the key is that it isn't really 16 parallel pipelines as much as a really wide ALU that can perform a really sophisticated operations that go beyond SIMD-type operations.

Another word about x86. Right now the fastest single-thread processor you can buy is an x86 processor. Intel x86 is the fastest chip for both SPECint and SPECfp. Faster than Itanium; faster than Power. For a long time Intel's x86 cores weren't allowed to have more memory bandwidth (and thus SPECfp numbers) than Itanium. Now that they have lifted that internal restriction, Core 2 really is the performance leader. Look to Nehalem's integrated memory controller to take away AMD's last advantage: memory latency.

If Intel can make x86 the fastest, why not the fastest low-power chip? Recall that Intel did have a line of StrongARM processors that they recently sold off. Once they realized they could make a competitive x86 chip in that space, they widely jettisoned their StrongARM business before it was clear what their plan was. My only caveat on Intel's move toward the embedded space is that it is generally easier for companies to "move upscale" rather than "move downscale". Intel isn't used to selling processors for $5, so that might be a real problem for Intel.

Intel has another key advantage in its fabrication. From everything I've read about Intel's 45nm process, it sounds really impressive. Not only are the the first to 45nm (as they usually are), they really changed the materials use to make the transistors to tackle the leakage problem. As both Silverthorne and Larrabee will likely be on this 45nm process, that is going to be hard for NVIDIA (or anyone else) to keep up with.

I can't say that I'm particularly happy that Intel might really stomp its competitors. Nobody likes to see Goliath win. But, I have to call it like I see it: I think NVIDIA should take Larrabee very, very seriously.

Of course, there is a fair chance that Intel internal politics cripples any Larrabee follow-ons or prevent the direct programming of Larrabee. Larrabee could slip or not have the performance or price/performance they were hoping for. Perhaps console gaming will totally eclipse PC gaming, making the GPU space for PC less relevant. Or NVIDIA's next design could leapfrog ahead of Larrabee, putting Intel off balance.

One thing I agree with you about 100%, 2009 is going to be an interesting year.