The reason why FFTW is designed to compute many transforms is that simply calling FFTW many times slows down multi-dimensional transforms. It is better to move the loop inside FFTW.
The basic problem is the resolution of the clock: FFTW needs to run for a certain time for the clock to be reliable.
fftwnd
actually may use some temporary
storage (hidden in the plan), but this storage space is only the size of
the largest dimension of the array, rather than being as big as the
entire array. (Unless you use fftwnd
to perform one-dimensional
transforms, in which case the temporary storage required for in-place
transforms is as big as the entire array.)