When working with a large matrix \(M\), we can distinguish between algorithms that require access to arbitrary elements of \(M\) and algorithms that only require the ability to multiply by \(M\). The latter are called matrix-free algorithms; they work with \(M\) as a linear operator rather than with its representation as a matrix.
The advantage of matrix-free algorithms is that for specific \(M\) there may be much faster ways to compute \(Mx\) than by matrix multiplication. As one extreme example, consider the centering operator \(x\mapsto x-\bar x\) on a length-\(n\) vector, which can be computed in linear time from its definition but would take time quadratic in \(n\) if the operator were represented as multiplication by an \(n\times n\) matrix. As another, consider multiplying by a diagonal matrix: the matrix can be represented in linear space and the multiplication performed in linear time by just ignoring all the zero off-diagonal elements.
One important application of bigQF is the SKAT test.
This involves the eigenvalues of a matrix that is the product of a
sparse matrix and a projection matrix. Multiplying by the sparse
component is fast for essentially the same reasons that multiplying by a
diagonal matrix is fast. Multiplying by the projection component is fast
for essentially the same reasons that centering is fast.
Both the stochastic SVD and Lanczos-type algorithms have matrix-free
implementations, and the package provides an object-oriented mechanism
to use these implementations to compute the distribution of a quadratic
form. The ssvd function also accepts these objects.
An object of class matrix-free is a list with the
following components
mult Function to multiply by \(M\)tmult Function to multiply by \(M^T\)trace numeric, the trace of \(M^TM\) (needed for pQF but not
ssvd)ncol integer, the number of columns of \(M\)nrow integer, the number of rows of \(M\)As a simple example, suppose M is a sparse matrix stored
using the Matrix package. We can define (see
sparse.matrixfree)
rval <- list(
mult = function(X) M %*% X,
tmult = function(X) crossprod(M, X),
trace = sum(M^2),
ncol = ncol(M),
nrow = nrow(M)
)
class(rval) <- "matrixfree"
The computations for trace, ncol, and
nrow are done at the time the object is constructed. The
mult and tmult functions will be efficient
because they use the sparse-matrix algorithms in the Matrix package.
The SKAT.matrixfree objects have a more complicated
implementation. The matrix is of the form \(M=\Pi G W/\sqrt{2}\), where \(W\) is a diagonal matrix of weights, \(G\) is a sparse genotype matrix, and \(\Pi\) is the projection matrix on to the
residual space of a linear regression model. Since M is not
sparse, it is not sufficient just to use the sparse-matrix code of the
previous example. Instead we specify the multiplication function as
function(X) {
base::qr.resid(qr, as.matrix(spG %*% X))/sqrt(2)
}
where spG is a sparse Matrix object containing \(GW\) and qr is the QR
decomposition of the design matrix from the linear model. The transpose
multiplication function is
function(X) {
crossprod(spG, qr.resid(qr, X))/sqrt(2)
}
Multiplying by the sparse component takes time proportional to the number of non-zero entries of \(GW\) and the projection takes time proportional to \(np^2\) where \(n\) is the number of observations and \(p\) is the number of predictors in the linear model. When the genotype matrix is sparse and \(p^2\ll n\), the matrix-free algorithm will be fast.