Today is the official birthday of AMD's quad-core Barcelona, and finally the wait is over. (It's also my official birthday… kind of funny that I share one with a CPU.) I've covered pretty much everything launch-related you'd want to know about Barcelona in previous posts—pricing and launch speeds, likely microarchitecture, system architecture, and big picture and competitive positioning, to name a few—so I'll devote the launch-day coverage below to taking a quick walk through the revelations that launch day brings, such as they are.
There aren't many reviews out this morning, and the few that are up aren't worth looking at (see the next section for why this is the case). The only bright spot in this picture is Scott's review at Tech Report, which is about as close as anybody could come to a "good" review of a brand new system with only one weekend to tweak and poke. And Scott's review is good because it raises almost as many questions as it answers.
First up, Scott's results show that Xeon rules in cache bandwidth and Barcelona rules in main memory bandwidth. Latency, however, is a different and more disappointing story. It seems that Barcelona's L3 cache latency is high enough that it causes at least some benchmarks to score the cumulative latency of Barcelona's memory hierarchy as not much better than Xeon's. It's going to take some more digging by benchmarking with real-world apps to see how much of this latency problem is an artifact of the benchmark's scoring mechanism, where latency is cumulative over the entire hierarchy, and how much this actually impacts real applications.
As expected, Xeon seems to keep the leadoverBarcelona in integer performance, not that it's easy to tell, since most benchmarks were floating-point-centric. And speaking of floating-point, Barcelona's floating-point showing is a real head-scratcher. Contrary to what I suggested in my previous post on Barcelona, the launch-date "reviews" uniformly show Barcelona with little or no floating-point advantage over Xeon.
Barcelona's floating-point performance
When it comes to Barcelona's floating-point results, there are two major issues to think about here: two-socket versus four-socket and clockspeed scaling. First, all the systems in the reviews that I saw were dual-socket. It has been my contention for some time now (see the posts linked above) that Barcelona's real chance to shine will be in four-socket systems. This is because Barcelona's main advantage is in the bandwidth advantages afforded it by its system architecture, and those advantages really begin to kick in with four-socket configurations.
A related issue is that on a per-core basis, Barcelona's floating-point performance may just not be good enough. The Core-based Xeons have extremely muscular floating-point and vector hardware, and I definitely didn't think that Barcelona would surpass it (or even match it, really) on a per-core basis. However, it's clear that Xeon is bottlenecked by its system architecture, so I thought that Barcelona's ample bandwidth would give it a floating-point edge.
However, Xeon's bandwidth bottleneck isn't nearly so pronounced in two-socket configurations as it is in four-socket configurations, so at two sockets the two processors' respective floating-point ALUs can duke it out on a relatively more level playing field. In this scenario, Xeon's superior floating-point and vector hardware carries it and gives it the 3D rendering scores that match and beat Barcelona's.
So my previous prediction that a four-socket Barcelona will still dominate in floating-point performance and performance/watt has yet to be tested by any of the reviewers. Let's hope we see some tests of this soon.
As far as clockspeed goes, it seems that Barcelona has just enough there that a good round of clockspeed boosts could put the results in a different light. But by the time those much-needed clockspeed gains materialize, Intel's 45nm "Penryn" Xeons will be upon us, and the landscape will have changed again.
You call this "reviewing"?
What if I told you that you could benchmark a brand new microprocessor architecture—indeed, a brand new system architecture (since upgrading from dual-core Opteron to quad-core Barcelona in the same socket really is a substantial change in overall system architecture)—in just a weekend?
If I told you that, I'd be lying. And this is why almost all of the handful of Barcelona "reviews" that went live today do little to increase our knowledge of AMD's latest. In a move that we've seen again and again from hardware companies that want to stack the review deck on launch day, AMD shipped Barcelona systems to hardware reviewers on Friday for a launch on Monday. This kind of behavior is designed to produce failed reviews that are shaky and opaque so that the hardware company can control the launch-day media narrative through a combination of an information shortfall and of spinning the little bit of info that is there.
But of course, everyone who does launch-day reviews knows the game, and reviewers do have a choice in whether they want to go along with the industry-standard abuse and manipulation in the name of being "first." So there's plenty of blame to go around.
Magic eight-ball says…
By way of conclusion, here's a quick summary of my launch-day impressions of Barcelona:
Single-socket desktop: Barcelona underperforms Core-based offerings from Intel, so AMD will have to ratchet up the clockspeed and keep prices down to be competitive with it.Dual-socket servers and workstations: In spite of the reviews, the picture here is murky. Intel is going to continue to carry this area in terms of raw performance, at least until Barcelona's clockspeed gains materialize. As far as performance/watt, the story may be different. It seems to me that Barcelona is a lot more competitive here as a platform than it is in raw performance, so this factor may actually keep AMD from losing more ground than it already has to Intel in the dual-socket space.Four-socket servers and HPC: My prediction—repeated multiple times over the last six months—that Barcelona will win in four-socket floating-point performance and performance/watt is still untested, so I hope someone tackles that soon. As for integer performance and performance/watt, Tigerton will be very, very hard to beat. Intel's system engineers have done a fantastic job with what they have to work with, and it shows in Tigerton's integer performance. I'm not confident that Barcelona can really take them down here, but I'd love to be proven wrong.
I'll be out in San Francisco this week, and I may end up meeting with AMD while I'm there. So we'll see if and how these initial impressions change in the coming days.