# CUDA acceleration of Quantum Simulation

For example, a SWAP gate can be instructed on a GPU register like

function instruct!(reg::GPUReg, ::Val{:SWAP}, locs::Tuple{Int,Int})
b1, b2 = locs
state = statevec(reg)

inds = ((blockIdx().x-1) * blockDim().x + threadIdx().x,
b = inds[1]-1
c = inds[2]
c <= size(state, 2) || return nothing
i = b+1
temp = state[i, c]
state[i, c] = state[i_, c]
state[i_, c] = temp
end
nothing
end
X, Y = cudiv(size(state)...)
end
Here, we devide the threads and blocks into a two dimensional grid with a same shape as the input GPUReg storage (i.e. $2^a\times 2^rB$). Only if two qubits at locs are $0$ and $1$ respectively, they are exchanged, otherwise do nothing. Although $3/4$ of threads are idle and plenty room for optimization, from this example, we see how easy CUDA programming is with CUDAnative.