When working with a large matrix M, we can distinguish between algorithms that require access to arbitrary elements of M and algorithms that only require the ability to multiply by M. The latter are called matrix-free algorithms; they work with M as a linear operator rather than with its representation as a matrix.
The advantage of matrix-free algorithms is that for specific M there may be much faster ways to compute Mx than by matrix multiplication. As one extreme example, consider the centering operator x ↦ x − x̄ on a length-n vector, which can be computed in linear time from its definition but would take time quadratic in n if the operator were represented as multiplication by an n × n matrix. As another, consider multiplying by a diagonal matrix: the matrix can be represented in linear space and the multiplication performed in linear time by just ignoring all the zero off-diagonal elements.
One important application of bigQF
is the SKAT test.
This involves the eigenvalues of a matrix that is the product of a
sparse matrix and a projection matrix. Multiplying by the sparse
component is fast for essentially the same reasons that multiplying by a
diagonal matrix is fast. Multiplying by the projection component is fast
for essentially the same reasons that centering is fast.
Both the stochastic SVD and Lanczos-type algorithms have matrix-free
implementations, and the package provides an object-oriented mechanism
to use these implementations to compute the distribution of a quadratic
form. The ssvd
function also accepts these objects.
An object of class matrix-free
is a list with the
following components
mult
Function to multiply by Mtmult
Function to multiply by MTtrace
numeric, the trace of MTM
(needed for pQF
but not ssvd
)ncol
integer, the number of columns of Mnrow
integer, the number of rows of MAs a simple example, suppose M
is a sparse matrix stored
using the Matrix package. We can define (see
sparse.matrixfree
)
rval <- list(
mult = function(X) M %*% X,
tmult = function(X) crossprod(M, X),
trace = sum(M^2),
ncol = ncol(M),
nrow = nrow(M)
)
class(rval) <- "matrixfree"
The computations for trace
, ncol
, and
nrow
are done at the time the object is constructed. The
mult
and tmult
functions will be efficient
because they use the sparse-matrix algorithms in the Matrix package.
The SKAT.matrixfree
objects have a more complicated
implementation. The matrix is of the form $M=\Pi G W/\sqrt{2}$, where W is a diagonal matrix of weights,
G is a sparse genotype matrix,
and Π is the projection matrix
on to the residual space of a linear regression model. Since
M
is not sparse, it is not sufficient just to use the
sparse-matrix code of the previous example. Instead we specify the
multiplication function as
function(X) {
base::qr.resid(qr, as.matrix(spG %*% X))/sqrt(2)
}
where spG
is a sparse Matrix object containing GW and qr
is
the QR decomposition of the design matrix from the linear model. The
transpose multiplication function is
function(X) {
crossprod(spG, qr.resid(qr, X))/sqrt(2)
}
Multiplying by the sparse component takes time proportional to the number of non-zero entries of GW and the projection takes time proportional to np2 where n is the number of observations and p is the number of predictors in the linear model. When the genotype matrix is sparse and p2 ≪ n, the matrix-free algorithm will be fast.