Sandra 2011b (17.25) Released

Anything and everything software related that doesn't fit above can go in here!
Post Reply
User avatar
Apoptosis
Site Admin
Site Admin
Posts: 33941
Joined: Sun Oct 05, 2003 8:45 pm
Location: St. Louis, Missouri
Contact:

Sandra 2011b (17.25) Released

Post by Apoptosis »

Just got an e-mail back from the guys at Sandra and it looks like a new patch is out! Check out the e-mail below for all the details.
We are releasing an update to Sandra, version 2011b (17.25).

If you are mirroring the Lite version, please update to this version and let us know the URL (if it has changed).


Yes, it's another patch! No, we're not doing weekly patches. After more testing and feedback we've decided to risk your wrath by releasing "b"; it should be worth it though:


** What's New **

- GPGPU: Spent many nights testing quite a few cards (from mobile 8 SP to 2x top cards) and optimised the workgroup/wavefront sizes. Generally used the highest allowed - with lower for weaker cards. This should give headroom for the top end cards of the future to perform better.

In effect this has made the scores on high-end cards increase quite a bit (E.g. 5870 now pushes 5GB/s AES128 and ~4GB/s AES256); we played it safe with the initial release in order not to break weaker cards and ensure it all worked from the highest to the lowest.


- OpenCL 1.1 "fission" support - not to be confused with AMD's "fusion" nor support for Uranium enriching hardware (like Stuxnet).

What is it? It is a way to break devices (e.g. CPU resources) into smaller units; in OpenCL all the CPUs/cores/threads in a system make 1 device; with fission you can use individual cores, threads, etc.

How is this useful in Sandra? When using GPU+CPU, loading all the CPUs/cores/threads is likely to "starve" the GPUs, especially if they're powerful: by not loading some CPU threads (depending on how many GPUs and CPU cores are present) to let them handle the GPUs you get better performance. We use hard affinity to give each GPU a thread only to service it.

Let's see some numbers:

2x 5870 + Core i7 965 (all cores/threads used): AES128 7542MB/s
2x 5870 + Core i7 965 (fissioned, some used): AES128 9241MB/s -> 22% increase!

And you will still see 98-100% CPU utilisation!


- OpenCL: Now that Intel has their own OpenCL run-time (Alpha), the Sandra was using *both* when all devices was selected (200% CPU!); now it's using Intel on Intel CPUs and AMD on AMD CPUs; if only one is installed it will use that.

http://software.intel.com/en-us/forums/ ... 6&o=a&s=lr


- OpenCL: Default selection is now "all GPUs" (e.g. GPU1+GPU2...) and not "all devices, i.e. CPU+GPU1+GPU2...". This allows easy benchmarking of all the GPUs without using the CPU which skewes the results.

CPU is now added to the end (e.g. GPU1+GPU2+CPU) wherever present in the enumeration. While mainly cosmetic, the CPU at the start (as enumerated on AMD's run-time) caused all kinds of issues.


- UI is also refined a bit, the current run is selected and made visible (result list scrolls) so you know what you just ran. It was sometimes very confusing as to where was the current result was - now you know!

It also tries to pick similar scores to compare to yours rather than top/middle/bottom of the list.


While relatively small changes, overall they do make a difference.



** Clarifications **

- GPGPU Crypto: DirectX ComputeShader: both 2011a & 2011b work on nV with the latest 263.xx driver:

http://developer.nvidia.com/object/cuda ... loads.html


- GPGPU Crypto: Use CUDA on nV, that works - we use hybrid PTX that includes both 2.x and 1.x (see Results).


- GPGPU memory bandwidth vs. DirectX: we transfer textures in DirectX and memory buffers in GPGPU; we also transfer more data which makes a difference for cards with a lot of memory (1GB+). You've got to work that memory, so we transfer as much as we can allocate!


- GPGPU memory bandwdith CPU+GPU: while GPUs have faster internal transfers, they have pretty low bus transfers; for CPU both are the same, so if you select CPU+GPU you're really skewing the transfer results - and thus the overall score.


** Results **

- GPGPU Crypto nV Tesla Result (so it works!) - Rank #2!

http://www.sisoftware.co.uk/rank2011d/s ... a284f7caf2


- Transcoding GPGPU nV Result (driver 196.xx, last one that works) - Rank #1!

http://www.sisoftware.co.uk/rank2011d/s ... af89fac7f7


Please update to this version when possible and thank you for all the feedback.

Thanks for reading.
Find us on Facebook to discover the faces behind the names!
Follow Me on Twitter!
Post Reply