- Launch Socket: AM3+ (socket is compatible with AM3 CPUs, existing and future AM3 Boards will not support bulldozer)
- Support: Via BIOS Upgrade for eligible Motherboards
- Runs on HyperTransport 3.1
- Bulldozer is being designed so as to not require AMD customers to change from their current AMD Opteron™ 4000 & 6000 Series platforms.
- The 6000 series will be an ideal home for the upcoming “Interlagos” (16-core) processor, with the 4000 series being equally well-suited for the upcoming 8-core “Valencia” processor. (sockets G34 & C32 respectively)
- There will be a Turbo CORE feature for “Bulldozer”, but there will be some improvements from what you see in “Thuban” (our 6-core AMD Phenom™ processor). There are some enhancements to give it more “turbo”
- RAM will have two new options: Load Reduced DIMMs, and a new 1.25V low power option (lower than today’s 1.35V)
- If you compare our 16-core Interlagos to our current 12-core AMD Opteron™ 6100 Series processors (code named “Magny Cours”) we estimate that customers will see up to 50% more performance from 33% more cores.
- Product development milestones will not be delivered through this blog for competitive reasons
- “Interlagos” – 16-core server processor
- “Valencia” – 8-core server processor
- “Zambezi” – 8-core client processor (desktop)
- Bulldozer will feature L2 cache that will be shared between integer cores
- for those customers pinning VMs to cores, they have the ability to build a 2P VM, and tie it to two cores that share a common L2 cache. This can help cut down on some of the cache latency as the VM’s two cores have all of the adjacent shared cache lines in a single location.
- There will also be some significant enhancements to our memory controller
- The upcoming “Ontario” processor will be based on the “Bobcat” core, which has a different core architecture than “Bulldozer.” There have been some that have made the assumption that a Bobcat was just a scaled down “Bulldozer”, but they are, in fact, different.
- By creating a modular architecture you have the ability to reduce/share a lot of the circuits that are lightly used, which can help cut down on power consumption and cost.
- we can increase the core count, so if you are interested in database, HPC or virtualization, that higher core count – with real cores – will help boost performance for your applications
- There will be some new software instructions that will be supported, allowing for greater performance and flexibility, but, it will be backwards compatible so you won’t need to change anything to start using the processor
Welcome to The Bulldozer Blog wrote:August 2, 2010 - Welcome to the Bulldozer Blog. Our next generation core architecture, a complete new design from the ground up, is called “Bulldozer.” This new core design is planned as the basis for our next generation AMD Opteron™ processors, as well as our high end client products.
The primary focus of the Bulldozer Blog will be commercial systems, and most specifically server, but there will be some client-focused blogs periodically as well. The server business tends to have much longer sales cycles and more architectural discussions, so you will see more focus from us in those areas.
So, what can you expect to see from this blog? Here is a snapshot of some of the things that we will be covering in the near future:
Hot Chips 22 Disclosures: Each year, the Hot Chips conference brings in-depth discussion of next generation technologies. On August 24th, Mike Butler will be presenting the next generation Bulldozer architecture at the conference. (The same day we’ll also disclose new details on “Bobcat” our other new core architecture scheduled to hit the market in 2011.) We’ll have a series of updates after this conference to give you some of the details, bringing the Bulldozer technology out to a wider audience.
20 Questions Blog: One of our most widely read (and participated) blogs for our Magny Cours product was the “20 Questions Blog.” We are going to do this again with Bulldozer this time. We’ll have an email address to send your questions to; we’ll choose the best questions and answer them in the blog.
Video Interview Blogs: Expect to see several video blogs from AMD as we bring you insight from some of the key engineers, executives and partners behind the product.
Product Video Demos: As we get closer to the launch, there will inevitably be some public demonstrations of the product. We’ll be sure to videotape those and bring them to you, as quickly as possible.
Just to make sure that everyone is up to speed on what Bulldozer is — a brand new design featuring up to 8 cores for client products and up to 16 cores for server products. Bulldozer will feature a new floating point unit that can support up to 256-bit floating point execution, which will boost the performance for technical applications that rely on floating point math. There will be some new software instructions that will be supported, allowing for greater performance and flexibility, but, it will be backwards compatible so you won’t need to change anything to start using the processor. We will be introducing this processor in 2011, and as we get closer we’ll get more granular on the actual availability.
Just so that we are all on the same page, some things you won’t see us discuss detailed benchmark results, pricing, launch date or any of our partners’ platforms. But, none of that should be a surprise to you because that is standard operating procedure. As we get closer to launch, I would expect to see some partners guest blogging about their platforms, but we’ll leave their disclosures to them.
Be sure to stay tuned and follow this blog; this will be the most interesting place to find all of the information about our next generation “Bulldozer” technology.
Bulldozer Blog: What IS Bulldozer wrote:August 2, 2010 - As we start the “Bulldozer” Blog, it is important to make sure that everyone is grounded on exactly what this planned product is; this will help you understand the next dozen or more blogs that will be published over the upcoming weeks and months.
Bulldozer is the code name of one of our two next generation core architectures, the other being “Bobcat”. The AMD Opteron™ processor family has been built over the past 7 or so years from a common core architecture that has grown over time from a single core design to today’s power-efficient, 64-bit, 12-core, virtualization-aware processors.
This new generation of processors is being designed with some new technologies that will help make these processors more efficient, higher performing and more power optimized than anything we’ve offered to date.
The platform changes that AMD introduced in 2010 were very deliberate and implemented with an eye toward Bulldozer. Simply stated: Bulldozer is being designed so as to not require AMD customers to change from their current AMD Opteron™ 4000 & 6000 Series platforms. Our new AMD Opteron™ 6000 Series platform (G34 socket-based) and our new AMD Opteron 4000 Series platform (C32-socket based) are compatible with the new Bulldozer products we plan to introduce in 2011. This means that the 6000 series will be an ideal home for the upcoming “Interlagos” (16-core) processor, with the 4000 series being equally well-suited for the upcoming 8-core “Valencia” processor.
The AMD Opteron™ 6000 series platform was designed to handle both the AMD Opteron™ 6100 series (code named “Magny- Cours”) as well as the future processors based on the Bulldozer core. Obviously until we have the final silicon in hand we can’t make any claims, but it is our expectation that customers will have a much easier time managing multiple generations of processors because we expect the underlying platform to be the same.
Bulldozer is being designed to support DDR-3, just like today’s platforms. All of the variations that we see today (standard, low power, registered and unbuffered) will be joined by two new options: Load Reduced DIMMs, and a new 1.25V low power option (lower than today’s 1.35V). Capacities and speeds will be driven by the market more than our platforms. We’ve designed a platform specification that supports higher speed and higher capacity than what we offer today, but we do have to be realistic – our technology partners will probably support those options that are JEDEC compliant and commercially available at the time of launch.
Now, what aren’t we going to talk about? Well, there is always that set of questions that we get asked over and over again, but we reserve the data for launch. So, let me save you some time on asking:
Performance: We release benchmarks at launch, so don’t expect too much detail there anytime soon. From a performance standpoint, if you compare our 16-core Interlagos to our current 12-core AMD Opteron™ 6100 Series processors (code named “Magny Cours”) we estimate that customers will see up to 50% more performance from 33% more cores. This means we expect the per core performance to go in the right direction — up. That is all I will say until launch.
Pricing will be available at launch as with all of our other products.
Launch date is currently set for sometime in 2011. I do realize that this is a wide range, but as we get closer to launch, we’ll narrow down the window a bit. Product development milestones will not be delivered through this blog for competitive reasons I’m sure you can appreciate. If we do release any schedule milestone achievements, we’ll let Dirk or someone from the engineering team have that honor.
This should get you up to speed with Bulldozer, stay tuned for more updates on a regular basis.
Bulldozer Blog: 20 Questions Intro wrote: August 10, 2010 - We played the game “20 questions” with the AMD Opteron™ 4000 and 6000 Series processors (formerly code named “Magny Cours” and “Lisbon”) and it was one of the more popular server blog postings of the year last year.
We’re going to do it again with Bulldozer, so get your best server questions ready.
We will choose our favorite questions and turn them into this year’s 20 questions blog.
Don’t expect to see any answers to questions about launch date, pricing, benchmarks or partners. Those things all have to wait for launch. The focus of this blog will be on servers and why we believe Bulldozer is ideally suited for server-class products. Server products have longer lifecycles, longer buying cycles and greater architectural decisions that need to be made, which is why our focus is there. However, because we plan for our client-side products to include Bulldozer-based offerings as well, much of the technology we’ll discuss here is being designed into our client products too.
We’ll try to get to all of your questions over the upcoming weeks, and hopefully give the designers and engineers an opportunity to answer some of them directly for you. Issue your question through Twitter at #AMD20Q or the form below.
Bulldozer Blog: A Parallel Universe wrote:August 12, 2010 - This morning I was reading the web when I rolled across a headline that made me spit out my coffee: “AMD Bulldozer Microprocessors May Not Bring Dramatic Performance Boosts”. Yes, you read it right. It was like that movie where the teenager woke up as her mom and her mom woke up as her.
Apparently this was gleaned from my blog where I mentioned that our upcoming Bulldozer architecture will have 33% more cores and 50% more performance.
50% more performance in one generation – that is an above average generation to generation performance increase. Combine that with the fact that our last generational change, from the AMD Opteron™ 2400 Series processors to the AMD Opteron™ 6100 Series processors delivered up to 86% (integer) and up to 118% (floating point) better performance.
If you take those increases and assume a 50% uplift, you are looking at a two generation performance increase of possibly more than 180% in only 2 years. Not dramatic?
Now, in the world that I live in, that is a pretty dramatic performance increase on its own, but when you stack it up against what other companies have seen when they move through architecture transitions, it becomes more incredible.
Maybe I am living in a parallel universe where down is up and up is down, but I don’t think so. What do you think about this situation, is a 50% increase (in the same power and thermal ranges) something that is compelling to you and your business? Let me know your thoughts on what qualifies as “dramatic”.
Bulldozer Blog: 20 Questions 1-5 wrote:August 23, 2010 - You’ve sent in your questions and we’ve begun to sort through them to pull out the best. There were plenty of common themes that were arising, so we’ll be grouping some of the bigger categories together. I am going to tackle some of the easiest ones first because some of the more technical questions will need to go to the engineers.
We’ll handle this blog in four rounds, with 5 questions each.
Let’s get started.
” There has been some confusion among those in the tech community regarding the actual CPU architecture, with ‘modules’ and ‘cores’ being explained differently by different people. “ – Waffle911
Yes, there has definitely been some confusion about modules and cores. Modules are only our way of laying out the subcomponents of the processor. You will not see us market modules as they are largely invisible to everyone but the designers. Operating systems, for instance, will enumerate the integer cores, seeing a 16-core AMD Operton™ processor (currently codenamed “Interlagos”) as 16 cores, not 8 modules. Modules do impact the way that certain CPU features are addressed – a discussion of which we’ll save for a later date – but in general we will focus on cores and not modules. The reason that we have modules is to help cut down on a lot of redundant circuitry in the processor. With multiple cores there is lots of duplication and this eats up die space and increases power draw. There are areas within the processor that can be shared because there is no major impact on performance, and other areas that should not be shared because they create bottlenecks.
You will never see a spec sheet with modules called out. Modules will not have a “marketing name”, they will only be “”Bulldozer” modules.” In reality, modules will only matter to the designers. Since we went out with ”Bulldozer” information very early we focused on the shared architecture and talked at the module level (it is still far too early to be sharing die shots….) Because of this the two most misunderstood theories became a.) the module was the whole processor and b.) the module was somehow equal to one core.
When we talk about cores we will always be using the most agreed upon definition of cores – the integer logic. Today most workloads are integer with a much smaller portion being floating point. This is why we focused on integer cores as the most logical way to define a core.
Each integer core will be able to run one software thread, and these threads can all be done simultaneously, unlike an SMT-type technology that lets two threads share one core. You typically find SMT technology on processors with much lower core counts, and its shared nature can create bottlenecks, even resulting in negative throughput in some cases.
As for core counts, here is what we have committed to at this point:
“Interlagos” – 16-core server processor
“Valencia” – 8-core server processor
“Zambezi” – 8-core client processor
“What are the virtualization advantages of “Bulldozer” relative to current AMD and “Bulldozer” time-frame Intel architectures?” – Muzaffer Kal
Well, to begin with, the competition has not revealed anything about their virtualization features in that timeframe so I will stick with AMD comparisons.
One of the most striking and easy comparisons to make is the pure core count. In my experience, customers today tend to use the “one VM per core” rule of thumb. In today’s world that means up to 24 VMs for a 2P AMD Opteron™ 6100 Series platform (12 cores per processor x 2 processors = 24 cores = 24 VMs), and up to 32 VMs for a 16-core, “Bulldozer”-based 2P “Interlagos” system. Or you can run several robust multi-core VMs on a server; for example, you could run up to eight VMs on an “Interlagos” system, each with 4 vCPUs.
Although we will not be releasing technical details yet, some of the new features include making the caches more efficient, preserving live migration compatibility between our cores, and more effectively managing changes to virtual machines such that hypervisor interactions are limited.
In addition to a greater number of cores, the upcoming “Bulldozer” platform will feature L2 cache that will be shared between integer cores. So for those customers pinning VMs to cores, they have the ability to build a 2P VM, and tie it to two cores that share a common L2 cache. This can help cut down on some of the cache latency as the VM’s two cores have all of the adjacent shared cache lines in a single location.
There will also be some significant enhancements to our memory controller. This is the first major memory controller overhaul since the introduction of the Quad-Core AMD Opteron processor back in 2007. Back then, everyone was looking at virtualization, but not as many were deploying it. These new memory controller enhancements were designed with virtualization in mind so that there are more optimizations around the memory handling for virtualization.
Someone else had also asked about support for Hyper V and older OS’s. We plan to support Hyper V in the future, just as we do today. In terms of older OS’s – there will be some limitations mainly because older OS’s were developed at a time when processors had fewer cores and supported less memory. An older OS can always be run as guest OS on a virtualized server. AMD collaborates with Microsoft to ensure that new processors are well supported by a range of OS versions. We will publish more info as we approach launch.
“The x86 core (Bobcat) of AMD Fusion APU Ontario will be based on Bulldozer architecture?” – Fabio Mendes
Actually, these are different designs. The upcoming “Ontario” processor will be based on the “Bobcat” core, which has a different core architecture than “Bulldozer.” There have been some that have made the assumption that a Bobcat was just a scaled down “Bulldozer”, but they are, in fact, different. I’m sure that between the two there are similarities and some small sub-components that are shared, but you won’t see the modular design of “Bulldozer” in “Bobcat.”
“Will Bulldozer get a Turbo CORE for single threaded applications, just like the Thuban?” – Björn
Yes. There will be a Turbo CORE feature for “Bulldozer”, but there will be some improvements from what you see in “Thuban” (our 6-core AMD Phenom™ processor). There are some enhancements to give it more “turbo”. This will be the first introduction of the Turbo CORE technology in the server processors. We expect that this will translate into a big boost in performance when using single threaded applications, and there should be some interesting capabilities for heavier workloads as well. We’re pretty excited about how this will be implemented with “Bulldozer”, but the specifics of how this is implemented and the expected performance gains will not be disclosed until launch.
“Which architectural decision for Bulldozer has the biggest impact for server-class products and how does it achieve that impact?” – Andrew Cowley
That is actually a tougher question than it sounds because it depends on what you are looking to impact. I personally believe that what most customers are looking for is better performance per watt with each generation of product. Or, to be more specific, people are looking for greater performance and scalability, but they want to do it in the same power/thermal envelopes that they are used to with today’s servers.
The modular architecture really allows us to do this with “Bulldozer”. In today’s processors there is a lot of circuitry that sits idle for most cycles; it needs to be there for the peak, but most of the time it is just sitting. That not only eats up power, but adds to the die space (think: cost.)
By creating a modular architecture you have the ability to reduce/share a lot of the circuits that are lightly used, which can help cut down on power consumption and cost.
For those that want more performance, cutting down on the power consumption means that you can get higher clock speeds within the same power/thermal envelopes.
For those looking for lower overall power consumption, the modular architecture helps in that aspect as well.
Because of this modular architecture, we can increase the core count, so if you are interested in database, HPC or virtualization, that higher core count – with real cores – will help boost performance for your applications.
But the key to an architecture like this is understanding how to push the limits, but not go too far. Sharing everything results in low power consumption, but terrible performance. Sharing nothing results in higher performance, but you get hammered by the power consumption and the cost of the die. So the key to a modular architecture will be how successfully you plan the shared components to maximize your design goals.
Stay tuned, in the next update we will cover floating point, compilers and power efficiency.
ALL Information here is derived directly from Bulldozer Blog posts, posted by John Fruehe, Director of Product Marketing for Server/Workstation products at AMDBulldozer Blog 20 questions part 2 wrote:August 30, 2010 -
“Will Bulldozer implement new versions of Hypertransport?” – Rheo
No, Bulldozer takes advantage of the same version of HyperTransport™ (HT) technology as our existing AMD Opteron™ 4000 and 6000 series processors, HyperTransport 3.1.
“Is there any”programmable-tangible” improvement in synchronization between cores in the same module? In other words, will I get tangible performance improvement if I can partition my multi-threaded algorithm to pairs of closely interacting threads, and schedule each pair to a module?” – Edward Yang
That is a very interesting question.
For the majority of software, the OS will work in concert with the processor to manage the thread to core relationships. We are collaborating with Microsoft and the open source software community to ensure that future versions of Windows and Linux operating systems will understand how to enumerate and effectively schedule the Bulldozer core pairs. The OS will understand if your machine is setup for maximum performance or for maximum performance/watt which takes advantage of Core Performance Boost.
However, let’s say you want to explore if you can get a performance advantage if your threads were scheduled on different modules. The benefit you can gain really depends on how much sharing the two threads are going to do.
Since the two integer cores are completely separate and have their own execution clusters (pipelines) you get no sharing of data in the L1 – and there is no specific optimizations needed at the software level. However, at the L2 cache level there could be some benefits. A shared L2 cache means that both cores have access to read the same cache lines – but obviously only one can write any cache line at any time. This means that if you have a workload with a main focus of querying data and your two threads are sharing a data set that fits in our L2, then having them execute in the same module could have some advantages. The main advantage we expect to see is an increase in the power efficiency of the cores that are idle. The more idle other cores are, the better chance the busy cores will have to boost.
However, there is another consideration to this which is how available other cores are. You need to weigh the benefits of data sharing with the benefit of starting the thread on the next available core. Stacking up threads to execute in proximity means that a thread might be waiting in line while an open core is available for immediate execution. If your multi-threaded application isn’t optimized to target the L2 (or possibly the L3 cache), or you have distinctly separate applications to run, and you don’t need to conserve power, then you’ll likely get better performance by having them scheduled on separate modules. So it is important to weigh both options to determine the best execution.
“How much extra performance will we see when running two-threaded applications on one Bulldozer Module compared to two cores in different modules?” – Simon
Without getting too specific around actual scaling across cores on the processor, let me share with you what was in the Hot Chips presentation. Compared to CMP (chip multiprocessing – which is, in simplistic terms building a multicore chip with each core having its own dedicated resources) two integer cores in a Bulldozer module would deliver roughly 80% of the throughput. But, because they have shared resources, they deliver that throughput at low power and low cost. Using CMP has some drawbacks, including more heat and more die space. The heat can limit performance in addition to consuming more power. Ask yourself, would you rather have a 4-cylinder engine that delivered 300HP or a 6-cylinder engine that delivered 360HP and consumed less gas? The cylinder to horsepower ratio for 4-cylinder is obviously higher (75HP/cylinder vs. the V6’s 60HP/cylinder), meaning that each cylinder can give you more performance. However, looking at the overall enginge, you are getting less total output; and you are getting that lower output at a higher cost (higher gas consumption).
“Current and forthcoming Nehalem EX based servers from IBM and HP top out at 8 sockets and 64 cores. What kind of vertical scalability can we expect from Bulldozer-based servers?” – David Roff
Bulldozer will fit into the current “Maranello” and “San Marino/Adelaide” platforms. “Maranello” is our high performance platform that will support up to 4 CPUs. Combining a “Maranello” platform with the upcoming 16-core “Interlagos” processors, the total core density of a 4P system will reach as many as 64 cores.
The 8P x86 market today is pretty small. According to IDC, last year it accounted for roughly 7,915 total servers, down 26% from the year before (Source: IDC Quarterly Server Tracker, Q4 2009). If you want to say that 2009 was a bad year, from 2007 to 2008 the 8P x86 market was essentially flat as well, so that isn’t a growth engine. Part of what is impacting that market is the core and memory densities of today’s systems. People bought 8P servers to get to 48 cores (8 x 6-core) or to get to large memory footprints. Today’s 4P systems are meeting those needs at a lower price, with lower power consumption and lower latency. When we get to 2011 with “Bulldozer,” you’ll see an increase up to 64 cores, and we expect the total memory footprint will increase again.
The bottom line is, you’ll get the 64 cores that you want, you’ll just have to spend a lot less to get them; is that OK?
“As far as power usage goes, from what I understand BD is supposed to be taking power management features to a level of granularity that hasn’t been seen yet with consumer/business grade CPUs. Will those new features be available to current MC users or will a platform upgrade be necessary? Can you elaborate on any new power saving features that would make a business want to consider BD at this time?” – Jeremy Stewart
Current “Maranello” platforms with AMD Opteron™ 6100 Series processors already have the hooks embedded in them for any “Bulldozer”-level power efficiency features. When we specified the platforms for today’s processors, we did so with “Bulldozer” in mind.
As we have said already in this blog, we expect the shared architecture to provide us with a great deal of power savings – there are a lot of circuits that are essentially being duplicated in today’s multicore processors. Having a new “from the ground up” design allowed us to take a very close look at the circuits and determine which ones are ripe for consolidation and which ones really need their own dedicated resources.
We started with inherently power-efficient microarchitecture and implementation that included dynamic sharing of shared resources, minimized data movement and took advantage of extensive clock and power gating. From there, we added active management support that allows us to digitally measure activity in order to estimate power. Support for chip-level core power gating was also added to the processor.
These new features join existing AMD Opteron processor technologies such as AMD PowerNow!™, AMD CoolCore™, low voltage DDR-3 memory support and more, all working in concert to help create a power efficient system.
Even though you’ll see processors with 33% more cores and larger caches than the previous generation, we’ll still be fitting them into the same power and thermal ranges that you see with our existing 12-core processors.
Slides from Engadget