James Lee and I have a new paper out: Lee and Chow, Conditions for the validity of SNP-based heritability estimation, Human Genetics, 2014. As I summarized earlier (e.g. see here and here), heritability is a measure of the proportion of the variance of some trait (like height or cholesterol levels) due to genetic factors. The classical way to estimate heritability is to regress standardized (mean zero, standard deviation one) phenotypes of close relatives against each other. In 2010, Jian Yang, Peter Visscher and colleagues developed a way to estimate heritability directly from the data obtained in Genome Wide Association Studies (GWAS), sometimes called GREML. Shashaank Vattikuti and I quickly adopted this method and computed the heritability of metabolic syndrome traits as well as the genetic correlations between the traits (link here). Unfortunately, our methods section has a lot of typos but the corrected Methods with the Matlab code can be found here. However, I was puzzled by the derivation of the method provided by the Yang et al. paper. This paper is our resolution. The technical details are below the fold.
The additive genetic contribution to a trait can be quantified with the linear model

where










where the cross terms are zero. Here, I use statistical notation where prime means transpose. Technically, I should take the expectation value over the distribution of







to the data via restricted maximum likelihood. Here,






Equation (**) is a mixed linear model, which is fit to the data using restricted maximum likelihood (REML). However, if we assume that the phenotypes are standardized and there are no other covariates then this reduces to simple maximum likelihood with log of the likelihood:

Differentiating (**) by




These equations are usually solved numerically with some sort of iteration scheme. However, if we consider the limit where


Substituting this back into (***) and solving for



We thus find that a necessary condition for the maximum likelihood (***) to yield the correct estimate for



If we take the (****) term by term, we find that




which is assumed to be small compared to

