The ultimate guide to Intel’s Core i7
November 17th, 2008Getting down and dirty with Intel’s latest processor architecture
Intel’s new Core i7 processor is preposterous. There’s no other way of putting it. It is simply the world’s fastest PC processor and it pumps out performance numbers so spectacular it’s enough to make you laugh out loud at the sheer, giddy ludicrousness of its existence.
In a way, that’s not giving away a great deal. We always knew it would take the PC performance crown. After all, the existing Core 2 CPU was the previous title holder and Core i7 was bound to be a least a little quicker. But we still didn’t expect the brutal, fratricidal hatchet job that is about to unfold.
Mostly, we weren’t expecting it because Core 2 itself was so damn good. Before we got our hands on Core i7, it was hard to imagine how a new CPU could possibly humiliate its progenitor as comprehensively as the Core 2 managed with the ill-fated Pentium 4 processor. We also had an inkling the microarchitecture that underpins Core i7, codenamed Nehalem, is more about laying solid foundations for the future than delivering a knock out blow at launch.
Intel is planning to dramatically increase the core counts of its processors in coming years. So a modular architecture with bags of bandwidth will be essential to keep performance scaling efficiently. That’s exactly what Nehalem delivers. With more threads, more bandwidth and more efficient load balancing, it has been designed from the ground up to be a mean, lean, parallel processing machine.
Thing is, at launch Core i7 only comes in quad-core trim. Hence, it’s the upcoming eight-core version, pencilled in for a late 2009 debut, that we thought would be the defining model in the Core i7 family, the chip that really allows the work Intel has done with the Nehalem architecture to shine. It may still turn out that way. But the first Core i7 chips are a bloody good start.
Fundamental rethink
To pull that off, Intel has fundamentally rethought the way it builds PC processors. The big change involves the shift towards a much higher level of feature integration on the CPU die itself. Strictly speaking, it’s not actually a new approach.
AMD’s CPU architectures have been guided by essentially the same philosophy of integration and modularity since the Athlon 64 appeared in 2003 with an on-die memory controller and HyperTransport links. But as we shall see, the details of Intel’s implementation and the performance it delivers are unlike anything seen before in the PC.
The first part of the Core i7’s integration riff is the introduction of truly monolithic multi-core processor dies. No longer will Intel bodge up quad-core products by lashing together a pair of dual-core CPU dies in a single processor package, as it has done to create all of its existing Core 2 quads. Instead, all the execution cores in Core i7 processors are fused into a single die.
The key benefits here are improved bandwidth and reduced latency. The multi-die approach used for Core 2 was a reasonable short term solution, but technically, it’s a bit of a kludge. It forces communication between the two processor dies to be routed off the CPU package to the northbridge chip on the motherboard and back again via the front side bus. Not exactly ideal.
Admittedly, Intel did a bang up job of hiding the flaws of this arrangement. Its main ruse was plastering the Core 2 family with ever greater lashings of on-die cache memory. The fact that the execution cores inside Core 2 were so damned good didn’t hurt, either. But as more and more cores are added to processors in future, an architecture that splits the execution cores into multiple dies would prove a serious boat anchor dragging down performance. So it’s gone.
Three flavours
It’s not the only feature that Intel has unceremoniously defenestrated. The ghastly old front side bus has finally been given the boot for Core i7. In its place is the thoroughly modern Quick Path Interconnect (QPI). Or at least it is for the first high end Core i7 processors released this month.
But not for the mainstream models Intel is planning to roll out next year. In fact, there will be three distinct flavours of Core i7 processors. First up is Bloomfield, the high performance chip launched this month, reviewed here and supported by Intel’s new X58 Express chipset. In 2009, probably towards the latter half of the year, Intel will take the wraps off a more affordable mainstream model, currently codenamed Lynnfield, and yet another new motherboard chipset, known as Ibex Peak. Joining it will be Havendale, a revolutionary new CPU with integrated graphics.
But more on that chip in a moment. QPI as found on high performance Core i7s is a fully bidirectional packet-based interconnect that offers up to 25.6GB/s of bandwidth per link. That’s double the bandwidth available from Intel’s fastest bus interconnect on a Core 2 CPU. Think of QPI as similar to HyperTransport links on AMD chips but with more bandwidth, and you’ll get the idea. For the record, the HyperTransport link on a current AMD Phenom processor tops out at 14.4GB/s.
On the desktop, Core i7 processors are limited to just one link. But QPI was actually created to provide a high-speed connection between multiple CPUs in workstation and server systems. Xeon CPUs based on the Nehalem architecture will therefore boast up to three QPI links per socket and frankly mind boggling bandwidth.
Anyhow, back on the desktop QPI’s job is to provide a big fat pipe connecting Core i7 to what remains of the northbridge. On the X58, this chip is known as the input/output hub (IOH). It provides a link to both system peripherals, such as disks and drives, via the southbridge chip, and PCI Express graphics cards. The X58 chipset itself supports up to four cards in multi-GPU rendering mode, though specific motherboard implementations will vary.
No Nvidia chipset
Intriguingly, Nvidia has decided not to produce a chipset for Bloomfield Core i7 processors. For the first time it is therefore allowing support for its SLI multi-card graphics technology on a non-Nvidia desktop motherboard chipset. Of course, AMD has allowed its competing Crossfire multi-GPU technology to run on Intel chipsets for some time. Consequently, the X58 has the honour of being the first single-CPU platform that supports both of the major multi-GPU technologies. About time, too.
Next up on the list of Core i7’s integrated features is the new on-die memory controller. As we mentioned, that’s something AMD’s processors already sport. But what Phenom and Athlon chips can’t match is Core i7’s new triple-channel layout. The result is a mammoth boost in bandwidth. Just the thing to feed future Nehalem CPUs with torrents of piping hot data.
That nearly wraps it up regarding the integrated features. But before we dive into the other half of the Core i7 narrative with a story of detailed enhancements aimed to improve both old fashioned single-threaded grunt and modern multi-threaded zing, it’s worth understanding how the upcoming mainstream Core i7s will differ from the first Bloomfield chips.
Direct Media Interface
For the Ibex Peak platform that supports mainstream Core i7 processors, the northbridge chip disappears entirely. What’s left is a southbridge chip connected to the CPU socket with the same Direct Media Interface Intel has been using to link its northbridge and southbridge chips since 2004. What’s more, the PCI Express links to graphics cards are also moved onto the CPU die itself. With all northbridge functions fully integrated onto the CPU die, these Core i7s therefore do away with the external QPI link, too.
Then there is Havendale, the final model in the Core i7 family. Unless AMD pulls a fast one, Havendale is likely to be the first PC processor with integrated graphics. Currently, Havendale is something of a mystery. We have heard rumours, for instance, that it will be composed of two separate dies, with the cores and cache memory on one and the memory controller, graphics and system I/O on the other.
However, what we know for sure is that the graphics core is based on Intel’s existing G45 3D architecture and not the upcoming Larrabee GPU. We also know that Havendale is all about reducing costs. It’s not a performance part, so don’t get excited about the potential performance benefit of a chip that fuses CPU and GPU functionality.
Different sockets – goodbye easy overclocking?
By now you’ll be getting a feel for what is a brain-bendingly complex range of CPUs and platforms. It’s much more complicated than the current Core 2 family, so, you won’t be surprised to learn that these radical changes in architecture cannot be supported by Intel’s existing LGA 775 CPU socket. The first Bloomfield Core i7 processors saddle up in the massive new LGA 1,366 socket (the numbers here refer to the total pin out on both chip and socket). Next year’s mainstream Core i7’s will adopt yet another socket, known as LGA 1,160.
Think about that for a moment, it’s a big change. It means you will no longer be able to buy a cheapo CPU, drop it into a decent motherboard and then clock the twangers off it for flagship-mimicking performance. From here on in there will be two classes of Intel platform.
That’s a picture of the broader architectural changes Intel has cooked up for Core i7. But there are plenty more juicy details to come. Probably Core i7’s worst kept secret is the resurrection of HyperThreading technology. Originally introduced with the Pentium 4 Netburst processor, it was last seen in 2005.
HyperThreading à la Core i7 is exactly the same idea as before: to make better use of execution resources by supporting the simultaneous computation of two threads of code per core. That sounds sensible, but HyperThreading has proved to be a bit of a handicap on multi-core processors in the past.
TurboMode ramps up individual cores
Intel has also introduced a snazzy new feature designed to rev up more old fashioned software that can often leave one or more cores twiddling their metaphorical fingers. With Turbo Mode, Intel has given Core i7 the ability to independently ramp up the operating frequency of individual cores. The clockspeed of one, two or three cores can be increased by up to 266MHz above the official rating when the chip detects single, double or triple-threaded software is being processed.
It’s all about making the most of the chip’s overall thermal and power capacities when not every core is under load. Combined with Core i7’s multi-threading prowess, the intention is to maximise performance for all kinds of code.
Dig down into the cores themselves and you’ll find even more enhancements. The SSE4.1 floating point instruction set becomes SSE4.2 and has been given seven new entries, for instance. Branch prediction has also been improved, as has the chip’s power management courtesy of individual power control units which enable idle cores to be shut down.
There are also upgrades to the chip’s buffers, improved loop streaming, the introduction of 64-bit instruction fusing and other upgrades that would frankly boil your brains if we went through the lot in detail.
One change that is worth appreciating, however, is the overhauled cache memory architecture. The Core 2 had a two level cache hierarchy with oodles of shared level 2 cache per die – up to 12MB in total for a quad-core chip. Core i7 processors move to a three level cache hierarchy with 64k of L1 cache, 256k of L2 and a shared 8MB of L3 for the first Bloomfield chips.
That’s a fair bit less than Core 2 quad-core models and reflects the fact that Core i7 doesn’t have to make up for deficiencies Intel’s previous CPUs had to live with, such as the creaky old front side bus and multi-die construction.
The relatively modest cache memory pool also explains how Intel has managed to bring in quad-core Nehalem at just 731 millions transistors overall. That’s getting on for 100 million fewer than the last of the quad-core Core 2 processors.
So, it should actually take up less space on the exotically expensive silicon wafers which Intel uses to knock out CPUs. That means it will be cheaper to manufacture. Pretty impressive given the fact that it’s based on the same 45nm production process as the latest Core 2s and yet squeezes in a lot more features.
Specs and prices
But enough of the hypotheticals. What you really want to know is the specs and prices of the Core i7 chips you can actually buy today and exactly how quick these little silicon wonders are. Initially, three Core i7s are available. The flagship of the range is the Core i7-965 Extreme Edition. It’s a 3.2GHz chip that will retail around the £1,000 mark including VAT.
Next is the mid-range 2.93GHz 940 model, which we estimate will change hands for just under £400. Rounding things out is the 2.66GHz Core i7-920. Likely to go on sale at a whisker under £200, this is arguably the most interesting of the lot when you take overclocking into account.
What all these chips share is the essentially the same physical feature set including 8MB of L3 cache and support for up to 24GB of DDR3 memory with a maximum frequency of 1,066MHz. The memory speed is actually a bit of a downgrade compared with the Core 2 family. But Intel says it is working on validating 1,333MHz for the near future.
To all of that, the Extreme Edition model adds a few configurability goodies, including adjustable CPU multipliers for each core and control of both the chip’s thermal and current operation limits.
Performance and benchmarks
Where, therefore, to start with the story of Core i7’s actual performance? Try this for size: in multi-threaded applications, the new 965 Extreme Edition model is often 50 per cent faster than the best of the Core 2s and sometimes as much as 60 per cent. That is frankly incredible given that we are talking about chips that share the same nominal clockspeed and core count.
Good examples include video encoding and professional rendering. The 965 Extreme tears through our X264 video encoding test at a scarcely believable 75fps. The Core 2 QX9770 Extreme Edition, itself a stunningly fast processor, can only manage 47fps. It’s a similar story in the Cinebench R10 rendering test. The 965 kicks up its feet after just 45 seconds. The QX9770 needs a minute and four seconds.
Undoubtedly, the work Intel has put into enhancing Core i7s multi-threaded performance has paid off in spectacular style. The massively improved bandwidth of the on-die memory controller is clearly a factor, here. Core i7 may only support 1,066MHz DDR3 memory for now. But with three channels and lower latency, bandwidth leaps from around 6GB per second on a typical Core 2 platform to over 18GB. Stunning, isn’t it?
We have to admit that HyperThreading lends a hand, too. Knock it on the head and the 965 plummets to 52 seconds and 62 frames per second in the Cinebench and X264 encoding tests, respectively. Overall, the multi-threading enhancements Intel conjured up for Core i7 are so damned effective, even the 2.66GHz Core i7-920 beats the old Core 2 QX9770 flagship with a big, multi-threaded stick. Yup, Intel’s new £200 processor has the measure of its previous £1,000 chip in many tests.
As for single-threaded performance, Core i7 is not quite such a game changer. It doesn’t beat seven shades out of Core 2 in most PC games, which tend to be low on intensive threads. Running Cinebench in single-threaded mode likewise reduces Core i7’s edge over Core 2 to below 15 per cent. And bear in mind that, thanks to Turbo Mode, Core i7 is typically running at least 166MHz faster than Core 2 in these low-thread tests.
If you are wondering, meanwhile, how AMD’s Phenom processor compares, the old sports news adage applies. If you are an AMD fan, look away now. Whether it’s old school single-threaded apps or up-to-the minute software optimised for parallel processing, Core i7 is usually twice as fast. Things were tough enough for AMD during the Core 2 era. Unless the upcoming 45nm shrink of Phenom seriously over delivers, it’s going to be even more miserable under the reign of Core i7.
Enjoy yourself
Having said all that, the mighty Core i7 does fall just short of perfection. Given the higher level on on-die feature integration, for example, we had hoped to see a significant reduction in overall platform power consumption. In fact, it rises marginally from 340 watts for the Core 2 QX9770 to 368 watts for Core i7-965. Proportionally, of course, the improvement in performance outstrips the increase in power consumption. So Intel will still be able to trumpet improved performance per watt figures.
What’s more, the new architecture doesn’t seem to deliver the same kind of idiot-proof overclocking as the last of the Core 2s. It’s early doors for the Core i7 family, but we had hoped to hit more than 3.7GHz using stock air cooling and voltages.
Despite all that, Core i7 remains a monumental achievement and a truly pleasant surprise. At best we thought the first quad-core models would raise Intel’s game by 25 to 30 per cent. The truth is nearer double that in the kind of multi-threaded applications that are becoming ever more important. We don’t even want to think about how hideously fast next year’s eight-core Nehalem will be.
Our advice is therefore to enjoy Core i7 for as long as it lasts. With global recession or even depression looming and Intel hardly being pressured by its sole rival AMD, the temptation will surely be to lift off the throttle and save a few pennies on R & D. In a few years time we might just look back on Core i7 as the golden age of desktop computing.
By Jeremy Laird of techradar.com
What do you think of Core i7?