http://www.sisoftware.co.uk/?d=news&f=2012_release
* Benchmark Results Certification*
---------------------------------
What is it? Certification uses the benchmark results submitted by users to work out whether your score (benchmark result) is valid (i.e. the device you tested is performing correctly) and how it compares to the scores obtained by other users when testing the same device.
By aggregating the scores submitted for each device and performing statistical analysis (e.g. computing mean/average, standard deviation, etc.) we can use statistical tools (e.g. normal distribution, T-distribution, etc.) to work out whether the score is within the expected range (confidence intervals).
You can see whether the score is above/below average average, but also how significant the difference is. For some devices, +/-10% is a lot; for others +/-30% may be just fine. It depends.
Based on the variability of scores you can determine whether the performance of your device is consistent or varies significantly from test to test. A large variability would indicate a problem either with the device or your environment (e.g. OS device drivers, virus checkers, etc.) that should be addressed.
* Ranker functionality must be enabled: the results get downloaded from the Ranker so if it is disabled, it cannot get the reference results for the device tested.
* General-Purpose (GP) Computing benchmarks - CPU vs. GP(GPU) vs. (GP)APU
-------------------------------------------------------------------------
Sandra's GP (former GPGPU) benchmarks may still be the only ones that allow full APU performance measurement against CPUs or even GP(GPU)s (through OpenCL) as they use the *same workload* as the native as well as the software VM (.Net/Java) counterparts allowing apples-to-apples comparisons:
- GP Performance (OpenCL / DX CS / CUDA) = CPU Multi-Media / .Net Multi-Media / Java Multi-Media / Video Shading
- GP Cryptography (OpenCL / DX CS / CUDA) = CPU Cryptography / .Net Cryptography / Java Cryptography
- GP Bandwidth (OpenCL / DX CS / CUDA) = Memory Bandwidth / Video Bandwidth
As a user, you would not care if a program uses native CPU instructions, the GP(GPU) or even your APU (CPU+GPU) to get your work done faster.
The point is you are benchmarking CPU+GPU together and not just CPU or GPU individually; both resources (thus the whole APU) is used to perform computations better which is the whole point.
As we support OpenCL, DirectX ComputeShader and CUDA just about all (GP)GPUs are supported.
* System Overall benchmark - Reloaded
-------------------------------------
What is it? The updated version generates an overall system performance score (geometric mean) based on the individual benchmarks that allows system-2-system comparisons:
- Native CPU performance: CPU Arithmetic, Multimedia
- Software VM performance: .Net* Arithmetic, .Net Multimedia
- Native Memory & Cache performance
- Storage performance
- GP(GPU/APU)** (General Purpose) performance: Arithmetic, Memory
* Why .Net? While applications have already been ported to .Net and WPF (Windows Presentation Foundation) the trend is accelerating with the launch of Windows 8/Server 2012 where new applications will need to use the METRO environment.
** Why GP(GPU/APU)? With the recent introduction of APUs (CPU with built-in (GP)GPU) we believe most future applications will use both (CPU+GPU aka APU) simultaneously for best performance.
* New Memory Latency Test: In-Page Random Access Pattern
--------------------------------------------------------
What is it? It is a new pattern that ensures the memory accesses stay "in-page" and thus we do not incur "out-of-page" lantencies as we move beyond L1D and L2 caches. The latencies reported are thus "best case" rather than "worst case" and match the latencies reported by vendors (which are always "in-page"/"best case").
Our view is that if you are unlucky enough to miss both L1D/L2 you are unlikely to be "in-page" considering the native page size is just 4kB thus you are very much likely to be "out-of-page". Even if using large pages (2MB but cannot easily be used in Windows, thus very few applications usually servers support them) considering most L2 caches are around this size you are still likely to be "out-of-page" if you missed L2.
We believe in giving you a choice thus you can select either "in-page random access", "full random access" and "sequential access".
Thanks go to Michael Schuette (Lost Circuits) and Joel Hruska for their testing, advice and support.
* Transcoding benchmark - CPU vs. GP(GPU) vs. (GP)APU
-----------------------------------------------------
The key advantage of Sandra's benchmark is WMF (Windows Media Foundation): it can use either software (CPU) transcoding, GP(GPU) (Intel/nVidia) or APU (AMD) transcoding depending on the encoders/decoders installed. So you can benchmark CPU vs. GP(GPU) or APU using the *same workload*.
Other benchmarks may use only software decoders/encoders which means you only test CPU performance and ignore GP(GPU) or APU performance entirely. Only by using the hardware accelerated decoders/encoders you can harness the power of GP(GPU) and APU.
* Large-page support for all memory tests
-----------------------------------------
Using large-pages (2MB on x86/x64 Windows) instead of native pages (4kB) results in less out-of-page hits and thus lower latencies. Unfortunately huge-pages (1GB) are not currently supported by Windows and we do not know whether the Windows 8/Server 2012 will enable them.
* FMA3 & FMA4 instruction set support for just released & future CPUs
---------------------------------------------------------------------
Using 256-bit register width (instead of 128-bit of SSE/2/3/4) yields further performance gains through greater parallelism in most algorithms. Combined with the increase in processor cores and threads we will soon have CPUs rivaling GPGPUs in performance.
-----------------------------------------------------------------------------