FFTW 1.2 User's Manual

(1)

The reason why FFTW is designed to compute many transforms is that simply calling FFTW many times slows down multi-dimensional transforms. It is better to move the loop inside FFTW.

(2)

The basic problem is the resolution of the clock: FFTW needs to run for a certain time for the clock to be reliable.

fftwnd actually may use some temporary storage (hidden in the plan), but this storage space is only the size of the largest dimension of the array, rather than being as big as the entire array. (Unless you use fftwnd to perform one-dimensional transforms, in which case the temporary storage required for in-place transforms is as big as the entire array.)

FFTW 1.2 User's Manual

(1)

(2)

(3)