This chapter describes the format in which multi-dimensional arrays are stored. We felt that a detailed discussion of this topic was necessary, since it is often a source of confusion among users and several different formats are common.
The multi-dimensional arrays passed to fftwnd
are expected to be
stored as a single contiguous block in row-major order (sometimes
called "C order"). Basically, this means that as you step through
adjacent memory locations, the first dimension's index varies most
slowly and the last dimension's index varies most quickly.
To be more explicit, let us consider an array of rank whose dimensions are n1 x n2 x n3 x ... x nd. Now, we specify a location in the array by a sequence of (zero-based) indices, one for each dimension: (i1, i2, i3,..., id). If the array is stored in row-major order, then this element is located at the position id + nd * (id-1 + nd-1 * (... + n2 * i1)).
Note that each element of the array must be of type FFTW_COMPLEX
;
i.e. a (real, imaginary) pair of (double precision) numbers. Note also
that, in fftwnd
, the expression above is multiplied by the stride
to get the actual array index--this is useful in situations where each
element of the multi-dimensional array is actually a data structure or
another array, and you just want to transform a single field. In most
cases, however, you use a stride of 1.
Readers from the Fortran world are used to arrays stored in column-major order (sometimes called "Fortran order"). This is essentially the exact opposite of row-major order in that, here, the first dimension's index varies most quickly.
If you have an array stored in column-major order and wish to transform
it using fftwnd
, it is quite easy to do. When creating the plan,
simply pass the dimensions of the array to fftwnd_create_plan
in
reverse order. For example, if your array is a rank three
N x M x L
matrix in column-major order, you should pass the
dimensions of the array as if it were an L x M x N
matrix (which
it is, from perspective of fftwnd
).
Multi-dimensional arrays declared statically (that is, at compile time,
not necessarily with the static
keyword) in C are already
in row-major order. You don't have to do anything special to transform
them. (See section Quick Start for Multi-dimensional Transforms, for an
example of this sort of code.)
Often, especially for large arrays, it is desirable to allocate the arrays dynamically, at runtime. This isn't too hard to do, although it is not as straightforward for multi-dimensional arrays as it is for one-dimensional arrays.
Creating the array is simple: using a dynamic-allocation routine like
malloc
, allocate an array big enough to store N FFTW_COMPLEX
values, where N is the product of the sizes of the array dimensions
(i.e. the total number of complex values in the array). For example,
here is code to allocate a 5x12x27 rank 3 array:
FFTW_COMPLEX *an_array; an_array = malloc(5 * 12 * 27 * sizeof(FFTW_COMPLEX));
Accessing the array elements, however, is more tricky--you can't simply
use multiple applications of the `[]' operator like you could for
static arrays. Instead, you have to explicitly compute the offset into
the array using the formula given earlier for row-major arrays. For
example, to reference the -th element of the array
allocated above, you would use the expression an_array[k + 27 * (j
+ 12 * i)]
.
This pain can be alleviated somewhat by defining appropriate macros, or, in C++, creating a class and overloading the `()' operator.
A different method for allocating multi-dimensional arrays in C is often
suggested which, though somewhat more convenient, produces very poor
performance (and, more to the point, is incompatible with
fftwnd
). This method is to create arrays of pointers of arrays
of pointers of...etcetera. For example, the analogue in this method to
the example above is:
int i,j; FFTW_COMPLEX ***a_slow_array; /* another way to make a 5x12x27 array */ a_slow_array = malloc(5 * sizeof(FFTW_COMPLEX **)); for (i = 0; i < 5; ++i) { a_slow_array[i] = malloc(12 * sizeof(FFTW_COMPLEX *)); for (j = 0; j < 12; ++j) a_slow_array[i][j] = malloc(27 * sizeof(FFTW_COMPLEX)); }
As you can see, this sort of array is inconvenient to allocate (and
deallocate). On the other hand, it has the advantage that the
-th element can be referenced simply by
a_slow_array[i][j][k]
.
However, if you care about performance, you should never allocate
multi-dimensional arrays in this way. Expressions like
a_slow_array[i][j][k]
really correspond to
*(*(*(a_slow_array + i) + j) + k)
, whereas expressions like
an_array[k + 27 * (j + 12 * i)]
from before correspond to
*(an_array + k + 27 * (j + 12 * i))
. As you can see, the former
expression substitutes two pointer dereferences for the two
multiplications in the latter expression. However, a pointer dereference
translates into a load from memory, and, as everyone should know, a
memory access is about the most expensive instruction you can possibly
execute--much more expensive than a multiplication, even if the memory
access is in cache.