My question is related to the D-deep pipeline. What does this pipeline look like? Is it something similar to the pipeline of the CPU (I mean only the idea because GPU-CPU are architectures completely different) about the fetch, decode, execute, write-back?One has to finally take care of the fact that each of the Nc cores(SPs) in an SM on the GPU has a D-deep pipeline that has the effect of executing D threads in parallel.
Is there a doc where this is documented?