c++ - creating arrays in nvidia cuda kernel -
Hello, I wanted to know whether it is possible to do the following in the Nvidia Kuda kernel:
__global__ zero count (long * C1, long size, ...) {... long D [1000]; ...}
or lower
__global__ zero count (long * c1, long size, ...) {... long D [ Shape]; ...}
You can do the first example, I have not tried Second .
However, if you can help it, then you want to do this to give your program a new look. You do not want to assign 4000 bytes of memory to your kernel. This will be a great use of CUDA local memory, because you will not be able to fit everything in registers. CUDA local memory is slow (400 cycles of memory latency).
Comments
Post a Comment