Meta-imputation

At this year’s annual meeting of the American Society of Human Genetics (ASHG), October 22-26 in Boston, I gave a talk on my current project: Meta-imputation. You can find the abstract here.

This new method aims at combining most of the genetic information present in multiple reference panels for use in imputation. Imputation methods use a sequence-based reference panel (haplotype data) to estimate missing sites in a study sample (phased genotype data) for downstream analysis in a genome-wide association study (GWAS), so as to increase the power to find significant signals of genotype/phenotype associations. Although some methods, e.g. IMPUTE2, are able to handle a second reference panel to fill up some of the gaps, the amount of reference information that is used for imputation is limited to the reference panel used for imputation. But given the breadth of information that is out there, and the fact that more and more sequence data sets have been or are being produced sets a new demand to imputation, so as to include much more information by using multiple reference panels (or as many sequence data sets as one deems appropriate). Meta-imputation tries to rectify this.
In meta-imputation, a range of selected reference panels are imputed into the same study sample independently, which generates a genotype probability matrix for each reference panel used in imputation, which contain the same number of imputed samples, but a number of partly differently imputed variants. Meta-imputation, quite simply, merges those imputed genotype matrices to create a single matrix that can be used directly for downstream association. Because much more information went into the overall estimation, I hope to increase both the imputation accuracy and the power to detect association.

While my talk at ASHG already showed that the method performs well in terms of imputation accuracy, the more interesting part, i.e. how well a causal signal can be picked up, was not yet evaluated. At the moment, I therefore work on an exhaustive simulation experiment in which I define the causal variant in a simulated study sample, which I try to pick up in association tests after meta-imputation, in comparison to regular imputation.
After my talk at ASHG and still now, I was approached by several researchers who were interested to test meta-imputation. Although I am very glad that this new method seems to be something many researchers want to use, I do not want to release a premature version of the command line tool for performing meta-imputation. Some more evaluation is necessary, but the current simulation framework I am working on will hopefully answer most of the questions that can be asked about the method. Also, because there are many different ways by which the imputed matrices can be merged, I am testing which one is most appropriate to perform the task.

Let’s see how it goes, and I hope to publish the method soon enough.