We've launched a new benchmark in Sandra 2013: GP GPU/APU Cache and Memory Latency benchmark.
While it works on the same principle as the System Memory Latency benchmark you know (and love?), as GPUs are different from CPUs - some things are not quite the same.
Here is an article to clarify its operation through concrete examples: the different kinds of GP memory (global, constant, shared, private, texture), how the latency is measured through the different access patterns (full random, in-page random, sequential/linear), how TLB caches affect latencies, etc.
GPUs are somewhat more secretive than CPUs, with the different cache levels and types not always published - even less the latencies of various levels. While nVidia's specific architectures (G80, GT200) have been analysed though micro-benchmarking in CUDA before - here Sandra can benchmark and contrast different architectures of GPUs and APUs from different vendors through OpenCL (and CUDA).
Unlike yourselves we don't have many devices to test - but even the 4 devices tested here (2 GPU, 2 APU from different vendors) show pretty interesting results.
We've used the very results to improve our own benchmarks in Sandra 2013 (GP Cryptography - AES encrypt/decrypt kernels) with significant gains (+25-40%) especially on AMD and Intel. (details in next article ;)
While the optimisation is somewhat simple, it is not that obvious without the latency data: we have worked with both vendors on the previous version and nobody thought of it. So it may be more useful than it first appears.
Anything and everything software related that doesn't fit above can go in here!
1 post • Page 1 of 1
- Site Admin
- Posts: 33897
- Joined: Sun Oct 05, 2003 8:45 pm
- Location: St. Louis, Missouri