Next: An improved replacement for MPI_Alltoall, Previous: Basic distributed-transpose interface, Up: FFTW MPI Transposes [Contents][Index]
The above routines are for a transpose of a matrix of numbers (of type
double), using FFTW’s default block sizes. More generally, one
can perform transposes of tuples of numbers, with
user-specified block sizes for the input and output:
fftw_plan fftw_mpi_plan_many_transpose
(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
ptrdiff_t block0, ptrdiff_t block1,
double *in, double *out, MPI_Comm comm, unsigned flags);
In this case, one is transposing an n0 by n1 matrix of
howmany-tuples (e.g. howmany = 2 for complex numbers).
The input is distributed along the n0 dimension with block size
block0, and the n1 by n0 output is distributed
along the n1 dimension with block size block1. If
FFTW_MPI_DEFAULT_BLOCK (0) is passed for a block size then FFTW
uses its default block size. To get the local size of the data on
each process, you should then call fftw_mpi_local_size_many_transposed.