SKAT, weights, and projections

The original SKAT statistic for linear and generalised linear models is Q = (y − μ̂)′GW2G′(y − μ̂) = (y − μ̂)′K(y − μ̂) where G is N × M genotype matrix, and W is a weight matrix that in practice is diagonal. I’ve changed the original notation from W to W2, because everyone basically does. The Harvard group has a factor of 1/2 somewhere in here, the BU/CHARGE group doesn’t.

When the adjustment model isn’t ordinary linear regression, there is a second weight matrix, which I’ll write Σ, giving the metric that makes y ↦ y − μ̂ the projection orthogonal to the range of X. That is μ̂ = (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)Y Note that both (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2) and I − (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2) are projections.

The matrix whose eigenvalues are needed for SKAT is H = P01/2KP01/2 (or K1/2P0K1/2) where P0 = V1/2[I − (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)]V1/2 is the covariance matrix of the the residuals, with V = var [Y]. Usually V = Σ, but that’s not necessary.

famSKAT has test statistic Q = (y − μ̂)′V−1GW2GV−1(y − μ̂) = (y − μ̂)′V−1KV−1(y − μ̂) so the matrix H is H = P01/2V−1KV−1P01/2.

When we want to take a square root of P0 it helps a lot that the central piece is a projection, and so is idempotent: we can define Π0 = [I − (Σ−1/2X(XTΣ−1X)−1XTΣ−1/2)] and write P0 = V1/2Π0V1/2 = V1/2Π0Π0V1/2.

Now consider , with H = T. We can take  = WGV−1V1/2Π0 = WGV−1/2Π0 where G is sparse, W is diagonal. The projection Π0 was needed to fit the adjustment model, so it will be fast. In family data where V = Σ is based on expected relatedness from a pedigree, the Cholesky square root R = V1/2 = Σ1/2 will be sparse.

Let f be the size of the largest pedigree. We should still be able to multiply a vector by in O(MNα + Nf2) time where α ≪ 1 depends on the sparseness of G. If so, we can compute the leading-eigenvalue approximation in O(MNkα + Nkf2) time. (In fact, we can replace f2 by the average of the squares of number of relatives for everyone in the sample)

The relevant bits of the code, the functions that multiply by and T, look like

CholSigma<-t(chol(SIGMA))
Z<-nullmodel$x
qr<-qr(as.matrix(solve(CholSigma,Z)))
rval <- list(
    mult = function(X) {
      base::qr.resid(qr,as.matrix(solve(CholSigma,(spG %*% X))))
        }, 
    tmult = function(X) {
      crossprod(spG, solve(t(CholSigma), base::qr.resid(qr,X)))
    })