Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders

To date, most expression quantitative trait loci (eQTL) studies, which investigate how genetic variants contribute to gene expression, have been performed in heterogeneous brain tissues rather than specific cell types. In this study, we performed an eQTL analysis using single-nuclei RNA sequencing from 192 individuals in eight brain cell types derived from the prefrontal cortex, temporal cortex and white matter. We identified 7,607 eGenes, a substantial fraction (46%, 3,537/7,607) of which show cell-type-specific effects, with strongest effects in microglia. Cell-type-level eQTLs affected more constrained genes and had larger effect sizes than tissue-level eQTLs. Integration of brain cell type eQTLs with genome-wide association studies (GWAS) revealed novel relationships between expression and disease risk for neuropsychiatric and neurodegenerative diseases. For most GWAS loci, a single gene co-localized in a single cell type, providing new clues into disease etiology. Our findings demonstrate substantial contrast in genetic regulation of gene expression among brain cell types and reveal potential mechanisms by which disease risk genes influence brain disorders. Bryois et al. mapped genetic variants regulating gene expression in eight major brain cell types. They found a large number of cell-type-specific genetic effects and leveraged their results to identify novel putative risk genes for brain disorders.

M ost genetic associations from GWASs of brain disorders lie within non-coding regions of the genome, which poses a challenge for the identification of risk genes 1 and brain cell types in which these risk variants regulate gene expression. eQTLs (that is, genomic regions that explain variation in gene expression levels) have become a powerful tool to uncover the molecular underpinnings of variants associated with complex traits and diseases in non-coding regions of the genome [2][3][4] .
Importantly, eQTL relationships are highly dependent upon cell type 5 , cell states and developmental stage of the human brain 6 , consistent with recent transcriptomic studies that show prominent temporal and cell-type-specific changes in expression 7 . However, most prior human brain eQTL studies were done using bulk tissues and have been only partially successful in prioritizing disease risk genes by integrating GWAS results with tissue-level eQTLs. A few recent studies have investigated eQTLs in specific brain cell types-for example, cis-eQTLs in dopaminergic neurons derived from induced pluripotent stem cells (iPSCs) 8 and in sorted primary human microglia 9,10 .
In the current study, to dissect the functional genetic variation of adult-onset neuropsychiatric and neurological diseases, we leveraged single-nuclei gene expression data from 192 adult human brain tissues (from the prefrontal cortex, temporal cortex and deep white matter) to perform a systematic eQTL analysis in all major adult human brain cell types. Here we show substantial cell-type-specific effects in the genetic control of gene expression. Integration of cell type eQTLs with GWAS shows that, at most GWAS loci, a single gene co-localizes in a single brain cell type, thereby not only providing insights into disease-relevant genes but also identifying new putative cellular mechanisms of risk genes that have been missed by bulk tissue-level analyses.

Results
Identification of brain cell type cis-eQTLs. Our goal was to identify genetic variants regulating gene expression in brain cell types. Therefore, we performed single-nuclei RNA sequencing (RNA-seq) and genotyping in 270 human brain samples from 148 independent individuals (Supplementary Table 1). We integrated our dataset with 102 human brain samples from published single-nuclei transcriptomic studies that were previously genotyped [11][12][13] , resulting in a total of 373 brain samples from 215 individuals. After quality control, normalization, sample integration and genotype imputation (Methods), we obtained gene expression data for 6,940-14,595 genes (protein coding and non-coding) and genotypes for 5.3 million single-nucleotide polymorphisms (SNPs) in 144-192 individuals for eight major brain cell types ( Fig. 1) expressing clear canonical markers of cell type identity ( Supplementary Fig. 1). To date, most expression quantitative trait loci (eQTL) studies, which investigate how genetic variants contribute to gene expression, have been performed in heterogeneous brain tissues rather than specific cell types. In this study, we performed an eQTL analysis using single-nuclei RNA sequencing from 192 individuals in eight brain cell types derived from the prefrontal cortex, temporal cortex and white matter. We identified 7,607 eGenes, a substantial fraction (46%, 3,537/7,607) of which show cell-type-specific effects, with strongest effects in microglia. Cell-type-level eQTLs affected more constrained genes and had larger effect sizes than tissue-level eQTLs. Integration of brain cell type eQTLs with genome-wide association studies (GWAS) revealed novel relationships between expression and disease risk for neuropsychiatric and neurodegenerative diseases. For most GWAS loci, a single gene co-localized in a single cell type, providing new clues into disease etiology. Our findings demonstrate substantial contrast in genetic regulation of gene expression among brain cell types and reveal potential mechanisms by which disease risk genes influence brain disorders.

Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders
We identified cis-eQTLs by testing all SNPs within a 1-megabase (Mb) window surrounding the transcription start site (TSS) of each expressed gene while adjusting for known covariates (study and disease status) and inferred covariates (genotype first principal components (PCs) and expression PCs) (Methods). We discovered 7,607 genes with a cis-eQTL at a 5% false discovery rate (FDR) across the eight different brain cell types (4,799 unique genes) (Fig. 2a  and Supplementary Table 2). This number represents only a small fraction of the potential cis-eQTL discoveries, as we estimate that at least 10-48% of the tested genes have an eQTL across the different cell types (Supplementary Fig. 2 and Methods). The number of detected eQTLs varied substantially between cell types (Fig. 2a) (for example, 2,734 eQTLs in excitatory neurons but only 16 in pericytes) and showed high correlation with the total number of nuclei belonging to the cell type (Fig. 2b). This suggests that more nuclei allow for a better quantification of gene expression and, ultimately, the discovery of more eQTLs.
Most cis-eQTLs replicated in a large tissue-level cortical eQTL study (Metabrain 14 ) with 77-100% of the SNP-gene pairs replicating at 5% FDR (91-100% of those with the same direction of effect) (Supplementary Fig. 3a). cis-eQTLs that did not replicate (FDR > 5%) affected more constrained genes ( Supplementary  Fig. 3b) and were located farther from the TSS than replicating cis-eQTLs ( Supplementary Fig. 3c). Neuronal cis-eQTLs replicated at a slightly higher rate (82% at 5% FDR, pi1 = 86%) than glial cis-eQTL (79% at 5% FDR, pi1 = 86%) (Fig. 2c), possibly because glial cell types are less prevalent than neurons in the cortex. The effect sizes of our cis-eQTLs were strongly correlated (R b 15 = 0.74-0.94) with the effect sizes in the Metabrain study 14 ( Supplementary Fig. 4), with higher correlations for neurons than glia (R b 15 = 0.85 for neurons, R b 15 = 0.78 for glia). Interestingly, the effect sizes in our study were larger than in the bulk Metabrain study 14 (mean absolute beta = 0.44 versus 0.34). To confirm this finding, we generated 'tissue-like' expression data from our dataset by aggregating all reads from all single nuclei for each individual and mapped cis-eQTLs using the same strategy. The 'tissue-like' eQTL analysis discovered substantially fewer eQTLs than the cell-type-level analysis (3,058 at 5% FDR) ( Supplementary Fig. 5a). In addition, the effect sizes across all genes were significantly lower in the 'tissue-like' analysis than in our cell-type-level eQTL analysis ( Supplementary Fig. 5b,c). These results suggest that genetic effects on gene expression are typically larger in brain cell types than previously estimated using bulk brain RNA-seq data.
As expected, cis-eQTLs were enriched around the TSS (Fig. 2d) and more frequently found upstream of the gene or within the gene body than downstream of the gene (Supplementary Fig. 6) (3,027 in gene body, 2,907 upstream and 1,673 downstream). In addition, we found that 47-56% of the top cis-eQTL SNPs affected the closest gene (depending on the cell type) (Supplementary Fig. 7 and Methods). Genes with an eQTL in glial classes had, on average, higher expression levels than genes without an eQTL (Wilcoxon P < 2.2 × 10 −16 ), whereas the opposite was true for neurons (Wilcoxon P = 0.021) ( Supplementary Fig. 8). As expected, genes with a cis-eQTL were generally found to be less constrained than genes without a cis-eQTL ( Supplementary Fig. 9a). However, genes for which we discovered a cis-eQTL in a glial cell type but not in the 'tissue-like' eQTL analysis were as constrained as genes for which we did not detect any eQTL ( Fig. 2e and Supplementary  Fig. 9b). cis-eQTL genes detected only in neuronal cell types were less constrained than genes without eQTLs but more constrained than genes detected in our 'tissue-like' analysis (Fig. 2e). These results suggest that eQTLs detected at a cell type level affect more constrained genes than eQTLs detected at a tissue level, suggesting that they might be more disease relevant. We performed single-nuclei RNA-seq on brain samples from 192 genotyped donors. We mapped cis-eQTLs for eight major brain cell types and identified a total of 7,607 cis-eQTL genes. We identified cell-type-specific genetic effects and leveraged our results to identify risk genes for brain disorders.
We next investigated whether any of the genes with a cis-eQTL had additional independent cis-eQTLs. We found that 165 genes had a secondary cis-eQTL, with most being discovered in excitatory neurons and oligodendrocytes (81 genes and 58 genes, respectively) ( Supplementary Fig. 10a,b). In addition, three genes had three independent cis-eQTLs. Independent cis-eQTLs were enriched around the TSS of their target gene but located at a larger distance from the TSS than the main cis-eQTLs (mean distance = 279 kilobases (kb) versus 101 kb, Wilcoxon P < 2.2 × 10 −16 ) ( Supplementary Fig. 10c,d).
Larger sample sizes will be necessary to identify more independent cis-eQTLs for cell types from the brain.
We also performed fine-mapping to identify putative causal SNPs for our cis-eQTLs (Supplementary Table 3 and Methods). As expected, fine-mapped SNPs with higher probability of being causal were more likely to overlap epigenomic marks 16,17 ( Supplementary   Fig. 11a). Overall, we fine-mapped 591 cis-eQTLs (causal probability >50%) ( Supplementary Fig. 11b). Examples of fine-mapped SNPs with high causal probabilities include rs2207000 as the likely causal SNP for the inhibitory neuron AHI1 eQTL (probability = 0.87) (Fig. 2g), a gene associated with abnormal brain development 18 , and rs17096421 as the likely causal SNP for the GLUD1 excitatory neuron eQTL (probability = 0.53) (Fig. 2g), a gene with an important role in inhibitory synapse formation 19 .
We reasoned that brain cell type eQTLs could overlap with brain cell-type-specific regulatory regions defined by single-nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq) 17 and sought to test this as an external validation ( Fig. 3a and Methods). We found that ATAC-seq peaks specific to glial cell types (astrocytes, microglia, oligodendrocytes and oligodendrocyte precursor cells/committed oligodendrocyte precursors      14 . The proportion of replicating SNP-gene pairs at 5% FDR is displayed as well as the pi1 statistic and the proportion of replicating eQTL with the same direction of effect. d, Enrichment of the discovered cis-eQTLs around the TSS. e, LOEUF scores 51 for (1) genes with no detected eQTL (at either the cell type level or the 'tissue-like' level); (2) genes with an eQTL detected at the cell type level but not at the 'tissue-like' level; (3) genes with an eQTL detected at the 'tissue-like' level but not at the cell type level; and (4) genes with eQTLs detected both at the cell type level and the 'tissue-like' level. All eQTLs detected in neuronal cell types or glial cell types were aggregated here (individual cell types are shown in Supplementary Fig. 9) (more constrained genes have lower LOEUF scores). The black horizontal bars indicate the medians. P values were obtained using a two-sided Wilcoxon test and are not adjusted for multiple testing. f, Examples of cis-eQTLs in astrocytes and microglia (n = 189/187 biologically independent individuals, respectively). The two-sided P value was obtained using fastQTL and is corrected for multiple testing (Methods). g, Examples of cis-eQTLs with fine-mapped SNPs (causal probability of 0.87 for AHI1 and 0.53 for GLUD1) (n = 178/187 biologically independent individuals, respectively). The two-sided P value was obtained using fastQTL and is corrected for multiple testing (Methods). Box plots show the median and first and third quartiles, and whiskers extend up to 1.5 times the interquartile range. LoF, loss of function.
(OPCs/COPs)) were enriched around the cis-eQTLs discovered in the same cell types but not in the other cell types, suggesting that the discovered glial eQTLs fall in regions of the genome functionally relevant to cell-type-specific gene regulation. Surprisingly, we did not observe enrichment of neuronal cis-eQTLs in neuron-specific ATAC-seq peaks, neuron-specific chromatin immunoprecipitation followed by sequencing (ChIP-seq) marks 16 (H3K4me3 and H3K27ac) ( Supplementary Fig. 12) and sorted nuclei bulk ATAC-seq 20 (Supplementary Fig. 13). The lack of enrichment of neuronal-specific regulatory regions around neuronal cis-eQTLs was robust to multiple attempts at falsification (Methods), suggesting that the discovered neuronal cis-eQTLs might be less cell type specific than glial cis-eQTLs.
Cell-type-specific cis-eQTL effects. We used a negative binomial mixed model to investigate how many of the 7,607 cis-eQTLs  Total eQTLs (converged) Cell type specific (at least one) Cell type specific (aggregate) Cell type specific (all) Fig. 3 | Cell-type-specific genetic effects on gene expression. a, Enrichment of cell-type-specific ATAC-seq peaks (derived from snATAC-seq 17 ) around our cis-eQTLs. b, Number of significant cell-type-specific genetic effects (5% FDR). 'Cell type specific (at least one)' shows the number of cis-eQTLs that have a different genetic effect for the same SNP-gene pair in at least one cell type. 'Cell type specific (all)' shows the number of cis-eQTLs that have a different genetic effect for the same SNP-gene pair in all cell types, whereas 'Cell type specific (aggregate)' shows the number of genes that have a different genetic effect in the discovered cell types than the aggregate genetic effect of all other cell types. c, Estimates of the proportions of cis-eQTLs that have a different genetic effect in another cell type. Estimates were computed on the interaction P value distributions using the pi1 statistic 25 . d, Example of a cell-type-specific cis-eQTL for CHRM5 in oligodendrocytes (n = 192 biologically independent individuals); e, RNF150 in microglia (n = 187 biologically independent individuals); and f, FBN2 in microglia (n = 187 biologically independent individuals). Each dot represents an individual. The displayed two-sided P values (Wald z-statistics) are the interaction P values testing whether the genetic effect in the discovered cell type (top-left cell type) is different from the genetic effects in all other cell types adjusted for multiple testing (Methods). Box plots show the median and first and third quartiles, and whiskers extend up to 1.5 times the interquartile range. had a significantly different effect size in the discovered cell type compared to (1) the average effect size across the seven other cell types, (2) at least one of the seven cell types and (3) all other seven cell types (Methods). We found 2,661, 3,537 and 185 genes (5% FDR), respectively, for the above-mentioned comparisons ( Fig. 3b and Supplementary Table 4). Specific cis-eQTLs from (1) and (2) were closer to the TSS than non-cell-type-specific eQTLs ( Supplementary Fig. 14). Among all cell-type-specific eQTLs, neuron-specific eGenes were significantly less constrained than the shared neuronal eQTL genes ( Supplementary Fig. 15a,b). Interestingly, the 185 cell-type-specific eGenes whose effect sizes were different than all other cell types were more constrained than non-cell-type-specific eGenes ( Supplementary Fig. 15c), with the strongest evidence for microglia-specific cis-eQTL genes. Examples of these genes include CHRM5, AVEN and TMPRSS5 in oligodendrocytes ( Fig. 3d and Supplementary Fig. 16) or RNF150 and FBN2 in microglia (Fig. 3e,f).
We used the pi1 statistic 21 to estimate the proportion of eQTLs with cell-type-specific effects for all pairwise comparisons across cell types. We found that cell-type-specific genetic effects ranged from 9% to 88% depending on the cell types compared. For example, all eQTLs detected in excitatory neurons have relatively similar effect sizes in inhibitory neurons (pi1 = 0.09), whereas 88% of the microglia eQTLs have a different effect size in excitatory neurons (pi1 = 0.88) ( Fig. 3c and Supplementary Fig. 17). Oligodendrocytes, OPCs/COPs and astrocytes clustered together based on the estimated proportion of genetic effects that are cell type specific (Fig. 3c). Similarly, neurons clustered together with an estimate that only 9-41% of the eQTLs have a different genetic effect in the other neuronal type. Microglia showed the strongest evidence for cell-type-specific genetic effects with an estimate that 55-88% of the discovered cis-eQTLs have a different genetic effect in the other cell types, reflecting their unique developmental origin. Altogether, these results suggest that there are substantial differences in the genetic regulation of gene expression across brain cell types, which might be relevant for brain disorders.
Brain cell type eQTLs mediate brain disease associations. We used Coloc 3 to investigate whether genetic variants associated with risk of Alzheimer's disease (AD) 4 , Parkinson's disease (PD) 22,23 , schizophrenia (SCZ) 24 and multiple sclerosis (MS) 25 potentially act through brain cell type cis-eQTLs (Supplementary Table 5). Co-localization analysis identified risk genes in brain cell types at 15-46% of the GWAS loci (at a posterior probability (PP) > 0.7) across the four different disorders ( Supplementary Fig. 18). Notably, we found that 61-90% of co-localized loci contained a single co-localized gene ( Fig. 4a) that usually occurred in a single cell type (Fig. 4a,b). This suggests that disease risk at a given GWAS locus is usually mediated by a single gene acting in a specific cell type.
For AD (Fig. 4c), we found most co-localization signals in microglia (BIN1, CASS4, CD2AP, FCER1G, INPP5D, PICALM, RAPEB1, RIN3, TREM2, USP6NL and ZYX), which is consistent with the known pathophysiological role of microglia in AD 26 . Several of these genes encode proteins that localize to the endolysosomal network (BIN1, RIN3, CD2AP, PICALM and ZYX) 27 , and some are known to interact (for example, CD2AP and RIN3 (ref. 28 ) or RIN3 and BIN1 (ref. 29 )). APH1B (which co-localized in both excitatory neurons and oligodendrocytes) encodes a protein that is part of the γ-secretase complex, which is known to cleave APP, resulting in the production of amyloid beta (Aβ), the main component of amyloid plaques (which characterize AD). ADAMTS4, CR1, NDUFS2 and SPPL2A co-localized exclusively in oligodendrocytes. ADAMTS4 was shown to play an important role in the production of Aβ4-x (highly aggregation prone Aβ peptides) in oligodendrocytes 30 , whereas SPPL2A was shown to cleave TMEM106B 31 , a key protein in frontotemporal lobar degeneration 32 . In addition, we replicated the co-localization of ZNF226 at the APOE locus 33 in OPCs/COPs, suggesting that additional AD risk exists at this locus beyond the APOE-coding mutation. We note that co-localization of a gene in a specific cell type is not influenced by its expression levels. For example, APH1B is expressed at similar levels in all cell types (Supplementary Fig. 19) but only co-localizes with AD in excitatory neurons and oligodendrocytes. We then leveraged allelic information from both our eQTL analysis and GWAS data to predict the effect of increasing gene expression on disease risk (Supplementary Table 5). We found that increased expression of several genes was associated with an increase in AD risk (for example, ADAMTS4, APH1B, BIN1, INPP5D or USP6NL) (Fig. 4c), whereas an increase in expression of other genes leads to a decrease in AD risk (for example, CASS4, PICALM or TREM2).
PD had a more complex pattern of co-localization signals (Fig. 4d). Some genes co-localized in multiple cell types (for example, BIN3, CAMLG or GPNMB), and some loci had co-localization signals with multiple genes (for example, WNT3 locus with co-localization signal for MAPT, LRRC37A, LRRC37A2, KANSL1, ARL17B and AC0056702.2). We note that the co-localization signals at the WNT3 locus are likely due to a common inversion in the European population that is associated with PD 34 . However, for most of the loci, a single gene co-localized in a single cell type. For example, SPTSSB, a gene downregulated in a GBA knockout mouse 35 , was found to co-localize exclusively in excitatory neurons, whereas TMEM163, a gene regulating zinc homeostasis 36 , co-localized exclusively in microglia. For both genes, our results suggest that increased expression leads to an increase in PD risk (Fig. 4d). GPNMB, a gene upregulated in PD and after lysosomal stress 37 , co-localized in both OPCs/COPs and microglia. Finally, although not reaching our co-localization threshold, we found evidence that LRRK2, a gene associated with familial forms of the disorder, co-localized in microglia (PP = 0.47, increased expression leading to an increase in disease risk). Our results suggest that genetic risk for PD involves a complex interaction of multiple genes acting in multiple brain cell types.
For MS (Fig. 4e), we found co-localization signals at 45 loci. CLECL1 (recently co-localized using a large cortical eQTL dataset 14 ), AC008764.1, IQCB1, MCM9, TYMP, RPS6KA4, VEGFB and MYO19 all co-localized in microglia. In addition, we observed co-localization signals in multiple other cell types (for example, NR1H3 in astrocytes). As MS is primarily a disorder driven by the immune system, we caution that brain cell type co-localization signals could be driven by pleiotropy (that is, a single variant or linked variants regulating different genes in different cell types). Indeed, when we tested MS genetic enrichment in cell-type-specific genes (derived from our control samples), we found the strongest enrichment for brain-infiltrating immune cells, followed by microglia ( Supplementary Fig. 20). Therefore, to assess the extent of pleiotropy at MS loci, we performed co-localization analysis with two immune tissues from GTEx 2 (blood and spleen) and 15 immune cell types from DICE 38 (Supplementary Fig. 21 and Supplementary Table 6). We observed that pleiotropy between brain cell types and immune cell types is common at MS loci. Of the 45 loci with co-localization signals in brain cell types, we also found co-localization in immune cell types at 34 loci. The 11 loci that were exclusively co-localized in brain cell types were the CD40 locus (SLC12A5 in excitatory and inhibitory neurons), the TNFAIP8 locus (HSD17B4 in inhibitory neurons), the JADE2 locus (H2AFY in inhibitory neurons), the RUNX3 locus (MTFR1L in inhibitory neurons), the SPRED2 locus (AC012370.2 in OPCs/COPs), the RMI2 locus (CLEC16A in oligodendrocytes), the FOXP1 locus (AC097634.1 in oligodendrocytes), the TGFBR3 locus (TGFBR3 in oligodendrocytes), the BATF locus (YLPM1 in inhibitory neurons), the LCK locus (BSDC1 and KPNA6 in oligodendrocytes and OPCs/COPs) and the SAE1 locus (CCDC9 in excitatory neurons). We caution that CD40 was previously co-localized with rheumatoid arthritis at the CD40 locus in immune B cells and could be a false negative in our analysis 39 . For the 34 loci with co-localization in both brain cell types and immune cells, we found consistent co-localized genes at 18 loci (for example, AHI1, CLEC2D, CLECL1, IQCB1, STAT4, TRAF3, TYMP and VEGFB) with, typically, additional co-localized genes in immune cell types (for example, only IQCB1 co-localizes at the IQCB1 locus in brain cell types, whereas IQCB1, EAF2 and SLC15A2 co-localize in immune cell types). In summary, we identified putative risk genes for MS in brain cell types, but we caution that these results should be interpreted in light of co-localization results in immune cell types.
SCZ had the most co-localization signals ( Fig. 4f) with at least one co-localized gene at 102 loci. Most co-localization signals were observed in excitatory neurons, which is consistent with the genetic enrichment of excitatory neuron-specific genes in SCZ 40 ( Supplementary Fig. 20). For 71 loci, we found a single co-localized gene, such as FURIN and ZNF823 in excitatory neurons (previously co-localized using bulk cortical eQTL data 41 ). Other interesting co-localized genes include CUL3 (in astrocytes), which encodes a protein that was shown to play an important role in excitation-inhibition balance 42 ; ADAM10 (in oligodendrocytes), which encodes a protein that cleaves APP and plays an important role in synaptic functions 43 ; or TRANK1 (in excitatory neurons), which was previously shown to co-localize with bipolar disorder 44 .
Mapping of risk variants to cell type regulatory elements. We next assessed whether GWAS SNPs (r 2 > 0.8 with index SNP) overlapped regulatory regions 16,17 in proximity of the co-localized gene (<100 kb from the TSS) in the co-localized cell types. We identified putative risk SNPs at 30% of the co-localized loci (67/220; Supplementary  Table 7). Notably, two AD GWAS SNPs (rs10792832 and rs3851179) overlapped a microglia-specific enhancer connected to PICALM through proximity ligation-assisted ChIP sequencing (PLAC-seq) 16 ( Fig. 5a,g)      located in a microglia-specific enhancer connected to INPP5D through PLAC-seq (Fig. 5b,h). Interestingly, INPP5D encodes a phosphatase that hydrolyses phosphatidylinositol-3,4,5-trisphosphate into phosphatidylinositol 3,4-diphosphate, which specifically binds to PLEKHA1 (ref. 45 ), a gene recently associated with AD through proteome-wide association study 46 . In addition, an AD GWAS SNP (rs117618017) overlapped the APH1B promoter (Fig. 5c), whereas another (rs7515905) was located in an oligodendrocyte enhancer located within the CR1 gene body (Fig. 5d). Similarly, we found that two MS GWAS SNPs (rs361725 and rs140522) overlapped a microglia enhancer close to the TSS of TYMP as well as its promoter region (Supplementary Table 7); that six PD GWAS SNPs overlapped an enhancer close to the TSS of GPNMB (4-27 kb) (Fig. 5e); and that five MS GWAS SNPs overlapped two astrocytic enhancers located downstream of NR1H3 (Fig. 5f).
We then used SNP2TFBS 47 to identify fine-mapped SNPs that could disrupt the binding of specific transcription factors (Supplementary Table 7). Of 651 fine-mapped SNPs, we predict that 210 disrupt transcription factor binding sites. Notably, rs4663105, which is located in a microglia enhancer 24 kb upstream of BIN1 (co-localized with AD), is predicted to disrupt the binding of KLF4, a transcription factor that was shown to regulate microglia activation 48 , whereas rs77892763, which is located in a microglia enhancer connected to USP6NL through PLAC-seq (also co-localized with AD), is predicted to disrupt the binding of KLF5. Another example is rs2905435, the SCZ GWAS index SNP at the GATAD2A locus, which is located in the promoter region of GATAD2A, and predicted to disrupt the binding site of REST, a key transcriptional factor in neurogenesis 49 .
In summary, our integrative analysis allowed us to fine-map disease-associated genetic variants to cell-type-specific regulatory elements, highlighting potential functional mechanisms of action for disease-associated variants.

Discussion
Here we present, to our knowledge, the first eQTL study of all major cell types in the adult human brain by leveraging single-nuclei RNA-seq data from a large number of individuals. Integration of eQTLs with GWAS and cell-type-specific regulatory elements allowed us to identify high-confidence risk genes, the cell types in which these risk genes are active and putative genetic variants underlying risk loci for four major brain disorders. Furthermore, we leveraged allelic effects to estimate whether an increase in gene expression is associated with an increase or decrease in disease risk-information that is crucial for drug development and often not reported. One striking result of this study is that, at most GWAS loci, a single gene co-localized in a single cell type, allowing us to substantially refine the mechanistic hypothesis for the etiology of the different brain disorders. For example, in AD, we found most co-localization signals in microglia (for example, BIN1 (refs. 9,10 ), PICALM 9,10 and USP6NL 10 ), as expected. However, we also found that APH1B, which encodes a key protein from the γ-secretase complex, ADAMTS4, NDUFS2, SPPL2A and CR1 co-localized in oligodendrocytes, suggesting that oligodendrocytes play an important under-recognized role in AD etiology.
The co-localization results appeared simplest-that is, greater number of risk genes prominently acting in few cell types-for AD compared to PD, SCZ and MS, which showed more complex co-localization patterns (for example, GPNMB locus in PD, where we find co-localization with GPNMB in microglia and OPCs/ COPs, and NUPL2 in oligodendrocytes). The simplicity of AD's co-localization patterns is consistent with the oligogenic nature of AD 50 and contrasts with the more polygenic architecture of SCZ and PD 50 .
For MS, we often found co-localization signals with different genes between brain cell types and immune cell types, suggesting that disease-associated haplotypes regulate different genes in different cellular contexts. Future high-resolution maps of functional regulatory elements of brain cell types from MS brain tissues might help resolve the context specificity of genetic regulation of MS-associated risk genes.
This study has several limitations. (1) The sample size is relatively small compared to eQTL studies from bulk brain tissues 2,14 , limiting the statistical power to discover cis-eQTLs. (2) We did not measure splicing, which might play an important role in disease, potentially missing a subset of co-localization signals. (3) The number of eQTL discoveries is limited for rare cell types (for example, 16 cis-eQTL genes in pericytes), suggesting that targeted enrichments may be essential to increase the number of cis-eQTLs discoveries in rare cell populations. (4) The GWASs used were all related to disease risk and not disease progression; hence, co-localized genes might not be therapeutically relevant at the time of diagnosis.
In summary, our study provides a systematic investigation of eQTLs in cell types of the adult human brain, defines a reference data set of brain cell-type-specific eQTLs and provides a foundational resource of high-confidence co-localized genes in disease-relevant cell types for robust future functional studies of psychiatric and neurological disease mechanisms using appropriate iPSC-based human cell models.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41593-022-01128-z.

Methods
Human tissue samples were obtained from the Netherlands Brain Bank (NBB), the MS UK Tissue Bank (UKTB) and the Edinburgh Brain Bank (EBB) via donor schemes with full ethical approval from respective brain banks (METc/2009/148 from the Medical Ethical Committee of the Amsterdam UMC and MREC/02/2/39 from the UK Ethics Committee) and individual material transfer agreements between Roche and NBB, UKTB and EBB. We complied with all relevant ethical regulations regarding the use of human postmortem tissue samples.
Nuclei isolation and single-nuclei RNA-seq. We generated brain single-nuclei RNA-seq data for three datasets (Roche_MS, Roche_AD and Columbia_AD (white matter), described below) that we complemented with raw sequencing data from three published datasets (Columbia_AD (grey matter) 13 , Mathys_AD 11 and Zhou_ AD 12 ) (Supplementary Table 1). Samples originated from the prefrontal cortex (Roche_MS, Columbia_AD 13 , Mathys_AD 11 and Zhou_AD 12 ), temporal cortex (Roche_AD) and deep white matter (Roche_MS, Roche_AD and Columbia_ AD). A description of the nuclei isolation and single-nuclei RNA-seq for the unpublished datasets is provided below.
The Roche MS dataset (Roche_MS) consists of 166 cortical gray matter and white matter samples from 83 unique individuals (29 controls and 54 cases; median age = 62 years, 25th percentile = 47 years, 75th percentile = 71 years, 43 females and 40 males). Snap-frozen tissue blocks from donors with gray matter lesions were provided by UKTB to Roche. Subpial gray matter lesions were determined using MBP (abcam-ab6263) by neuropathologists at Roche and confirmed by independent experts (Anna Williams and Roberta Magliozzi). Pathological staging of white matter lesions from EBB and NBB donor samples was done at the respective brain banks. In the white matter, de-myelinated and re-myelinated lesions were identified by Luxol Fast Blue staining and immunohistochemistry for proteolipid protein, and de-myelinated lesions were grouped into active, chronic active and chronic inactive lesions with Oil Red O and/or HLA-DR staining to determine microglial activity. Brain tissue specimens from the respective white matter regions were shipped on dry ice to Roche and directly processed.
The Roche AD dataset (Roche_AD) consists of 80 samples from 40 unique individuals (one sample from the temporal cortex and one from deep white matter for each individual, 12 cases, 25 controls, three dementia; median age = 70 years, 25th percentile = 57 years, 75th percentile = 87 years, 22 females and 11 males). Snap-frozen temporal cortex tissue sections from donors were provided by the Amsterdam Brain Bank to Roche. Pathological Braak staging was determined at the Amsterdam Brain Bank and confirmed at Roche, where levels of b-amyloid plaques and p-Tau were assessed by immunohistochemistry using beta Amyloid (MOAB-2) (Novus Biologicals) and Phospho-Tau (Ser202, Thr205) (AT8) (Thermo Fisher Scientific) antibodies on sections cut before and after the ones used for single-nuclei RNA-seq, from the same tissue block The Columbia AD dataset (white matter) (Columbia_AD) consists of 24 white matter samples from the same individuals for which gray matter single-nuclei RNA-seq was recently published 13 (12 controls, 12 cases; median age = 86 years, 25th percentile = 85 years, 75th percentile = 92 years, 12 females and 12 males). Snap-frozen tissue blocks from donors were shipped on dry ice to Roche and directly processed. Snap-frozen temporal cortex tissue sections from donors were provided by Columbia University to Roche. Pathological Braak staging was determined at Columbia University and confirmed at Roche, where levels of b-amyloid plaques and p-Tau were assessed by immunohistochemistry using beta Amyloid (MOAB-2) (Novus Biologicals) and Phospho-Tau (Ser202, Thr205) (AT8) (Thermo Fisher Scientific) antibodies on sections cut before and after the ones used for single-nuclei RNA-seq, from the same tissue block For the three newly sequenced datasets, nuclei were isolated from fresh-frozen 10-μm sections using Nuclei PURE Prep Nuclei Isolation Kit (Sigma-Aldrich) with the following modifications. The regions of interest were macro-dissected with a scalpel blade, lysed in Nuclei Pure Lysis Solution with 0.1% Triton X, 1 mM DTT and 0.4 U µl −1 of SUPERase-In RNase Inhibitor (Thermo Fisher Scientific) freshly added before use and homogenized with, first, a 23-gauge syringe and then a 29-gauge syringe. Cold 1.8 M Sucrose Cushion Solution, prepared immediately before use with the addition of 1 mM DTT and 0.4 U µl −1 of SUPERase-In RNase Inhibitor, was added to the suspensions before they were filtered through a 30-μm strainer. The lysates were then carefully and slowly layered on top of 1.8 M Sucrose Cushion Solution previously added in new Eppendorf tubes. Samples were centrifuged for 45 minutes at 16,000g at 4 °C. Pellets were re-suspended in Nuclei Storage Buffer with 0.4 U µl −1 of SUPERase-In RNase Inhibitor, transferred in new Eppendorf tubes and centrifuged for 5 minutes at 500g at 4 °C. Pellets were again re-suspended in Nuclei Storage Buffer with 0.4 U µl −1 of SUPERase-In RNase Inhibitor and centrifuged for 5 minutes at 500g at 4 °C. Finally, purified nuclei were re-suspended in Nuclei Storage Buffer with 0.4 U µl −1 of SUPERase-In RNase Inhibitor, stained with trypan blue and counted using Countess II (Life Technologies). A total of 12,000 estimated cells from each sample were loaded on the 10x Single Cell Next GEM G Chip. cDNA libraries were prepared using the Chromium Single Cell 3′ Library and Gel Bead v3 kit according to the manufacturer's instructions. cDNA libraries were sequenced using the Illumina NovaSeq 6000 system and NovaSeq 6000 S2 Reagent Kit version 1.5 (100 cycles), aiming at a sequencing depth of a minimum of 30,000 reads per nucleus.
Single-nuclei RNA-seq analysis. All samples from the Roche_MS datasets were processed with Cell Ranger (version 3.1.0), using the GRCh38 reference human genome and the Ensembl Homo_sapiens GRCh38.96 reference annotation (modified to count intronic reads). Gene expression quantifications for each nucleus were obtained from the 'filtered_feature_bc_matrix' Cell Ranger (version 3.1.0) output folder. We identified doublets using scDblFinder 52 (version 1.4.0), applied to each sample separately. After removing doublets, we also removed nuclei with fewer than 300 features and 500 counts as well as samples with fewer than 500 cells. We did quality control with SampleQC (version 0.4.5) with default parameters 53 . SampleQC allowed us to identify a subset of cells in many samples with both high splice ratios and high mitochondrial proportions (90% of reads being spliced), which were excluded. After excluding outliers, we again excluded any samples with fewer than 500 cells remaining, resulting in 750,614 nuclei across 166 samples passing quality control. Samples were integrated using Conos 54 (50 PCs,resolution = 8), and broad clusters were annotated based on the expression of canonical markers (SLC17A7 for excitatory neurons, GAD2 for inhibitory neurons, AQP4 for astrocytes, MOG for oligodendrocytes, PDGFRA for OPCs/COPs, C1QA for microglia, CLDN5 for endothelial cells and RGS5 for pericytes).
All samples from the AD datasets (Roche_AD, Columbia_AD, Mathys_AD 11 and Zhou_AD 12 ) were processed with Cell Ranger (version 3.1.0), using the GRCh38 reference human genome and the Ensembl Homo_sapiens GRCh38.91 reference annotation (modified to count intronic reads). Nuclei were defined as barcodes with at least 500 unique molecular identifiers (UMIs) (excluding mitochondrial RNA) and less than 5% of mitochondrial RNA. If a sample had more than 10,000 nuclei, we kept the 10,000 nuclei with the highest number of UMIs. Doublets were identified using scDblFinder 52  labeled independently using the canonical markers described above. A subset of small clusters from the Conos integration expressed multiple canonical markers and were labeled as 'mixed' . Nuclei that were not labeled as 'mixed' were annotated with the Conos cell type labels. The remaining nuclei were labeled with the Harmony cell type label.
Pseudo-bulk gene expression matrices were then generated by summing all counts for each gene in each individual in each cell type (across multiple samples if more than one was available for an individual) and normalized using TMM 56 .
Genotyping quality control. We genotyped samples from the Roche_AD and Roche_MS datasets using the GSAv3 Illumina ChIP. Genotypes were then imputed using the Haplotype Reference Consortium reference panel (version r1.1) 57 and lifted over to GRCh38. Genotype processing and quality control was performed using Plink version 1.9 (ref. 58 ). SNPs with imputation score <0.4 or with missingness greater than 5% were excluded as well as individuals with more than 2% of missing genotypes. In addition, we obtained whole-genome sequencing results for the ROSMAP datasets (Columbia_AD 13 , Mathys_AD 11 and Zhou_AD 12 ) 59 . Similarly, SNPs with missingness greater than 5% were excluded as well as individuals with more than 2% of missing genotypes. Genetic variants that were common to imputed genotypes and whole-genome sequencing were then merged. After merging, SNPs with minor allele frequency (MAF) <5% or deviating from Hardy-Weinberg equilibrium (P < 1 × 10 −6 ) were excluded. We identified related individuals (pi_hat > 0.2) and kept only one individual from related pairs. In addition, individuals deviating from more than three standard deviations from 1000 Genomes European populations 60 on the first and second components of a multidimensional scaling (MDS) reduction were excluded ( Supplementary Fig. 22). Finally, only individuals with both genotype and single-nuclei RNA-seq were retained for the eQTL analysis. After quality control, we obtained high-quality genotypes for ~5.3 million SNPs (MAF > 5%) in 192 individuals.

Sample swap identification.
We used MBV 61 to identify sample swap (as implemented in QTLtools (version 1.3.1)). In brief, MBV takes as input a VCF file containing the genotype data of the samples as well as BAM files containing the mapped single-nuclei sequencing reads. MBV then reports the proportion of heterozygous and homozygous genotypes (for each individual in the VCF file) for which both alleles are captured by the sequencing reads in all BAM files. Correct samples can then be identified, as they should have a high proportion of concordant heterozygous and homozygous sites between the genotype data and the mapped sequencing reads.
Ambient RNA estimation. For each sample and each gene, we summed UMIs across all empty droplets (defined here as droplets with fewer than 100 total UMIs, excluding mitochondrial RNA), resulting in an ambient RNA profile for each sample. We then used the DropletUtils (version 1.14.1) R package 62 and its 'maximumAmbience' function to estimate the maximum contribution of the ambient RNA profile to the observed pseudo-bulk expression profile (for each cell type from each sample). In brief, the 'maximumAmbience function' scales the ambient RNA profile by some factor. A P value is then computed for each gene based on the probability of observing a UMI count in the scaled ambient RNA profile equal to or below the UMI count in the pseudo-bulk expression profile (using the lower tail of a negative binomial distribution). P values are then combined across all genes using the Simes method, testing the joint null hypothesis that all gene-level null hypotheses are true. The largest scaling factor that fails to reject this joint null hypothesis at a P-value threshold of 0.1 is then selected. The scaled ambient RNA profile can then be interpreted as the maximum number of reads in the pseudo-bulk expression profile that can be attributed to ambient RNA. eQTL mapping. We mapped cis-eQTLs within a 1-Mb window of the TSS of each gene using fastQTL (version 2.0) 63 , as described previously 2 . In brief, for each gene, fastQTL performs permutations of the expression data and records the best P value for any SNP in the cis window after each permutation. As the distribution of the best P values follows a beta distribution under the null hypothesis 63 , fastQTL uses maximum likelihood to estimate the parameters of the beta distribution of each gene (which depends on the LD structure of the cis region). An adjusted gene-level P value (bpval) is then computed based on the beta distribution for each gene. We corrected for multiple testing across all genes (per cell type) using the q-value R package 21 on the adjusted gene-level P values (bpval). We only mapped eQTL for genes with an average expression level of at least 1 count per million (CPM) (computed within each cell type) and for which we estimated that the proportion of reads originating from ambient RNA was, on average, below 10% (computed within each cell type), resulting in a total of 6,940-14,595 genes tested across the eight cell types. Individuals who had fewer than ten single nuclei within a particular cell type were dropped from the eQTL analysis. We used the following covariates: three first genotyping PCs, disease status (MS, AD, control or other), study (Roche_MS, Roche_AD, Columbia_AD 13 , Mathys_AD 11 and Zhou_AD 12 ) and the 70 first expression PCs (which were significantly associated with multiple technical covariates (for example, study, tissue, genotype PCs, diagnosis, age and number of single cells) (Supplementary Fig. 23)). This number of expression PCs maximized the number of cis-eQTLs detected ( Supplementary  Fig. 24). Although the number of expression PCs is high in comparison to bulk RNA-seq eQTL studies, the results appear robust for multiple reasons: eQTLs (1) strongly replicated in the Metabrain 14 bulk RNA-seq eQTL study ( Fig. 2c and Supplementary Figs. 3a and 4); (2) were enriched close to the TSS (Fig. 2d); 3) affected less constrained genes ( Supplementary Fig. 9); and (4) were enriched around cell-type-specific epigenomic marks (Fig. 3a). Furthermore, the P-value distribution was sensible ( Supplementary Fig. 2), and random permutation of the gene expression labels lead to only two genes with a significant eQTL (5% FDR). Estimates of the proportion of true alternative hypothesis (that is, proportion of genes with a cis-eQTL; Supplementary Fig. 2) was performed using the pi1 statistic from the q-value R package (version 2.26.0) 21 . Independent cis-eQTL mapping was done using QTLtools (version 1.3.1) 64 , with the same covariates and window size as the original cis-eQTL analysis.
Sharing of brain cell type eQTL with Metabrain cortex eQTL. We obtained P values and effect sizes from cortical samples (n = 2,970) from the Metabrain study 14 corresponding to the most significant SNP-gene pairs for each cis-eQTL gene (n = 7,607 at 5% FDR). Replication was assessed only for SNP-gene pairs that were tested in both studies (6,386 SNP-gene pairs). Sharing of eQTLs was computed using multiple statistics: (1) using the pi1 statistic from the q-value R package 21 ; (2) as the number of matched P values that are significant at 5% FDR (Benjamini-Hochberg 65 ); and (3) using the R b statistic 15 , which measures the correlation of the effect sizes between matched SNP-gene pairs. Epigenome enrichments. We used QTLtools (version 1.3.1) 64 (fdensity) to test whether cell-type-specific regulatory elements were enriched around our cell type cis-eQTL (n = 7,607). For each 10-kb bin in a 2-Mb window around the cis-eQTL, fdensity computes the number of functional elements overlapping the bin. The epigenomic data were obtained from three different studies (Nott et al. 16 , Corces et al. 17 and the dorsolateral prefrontal cortex region for Fullard et al. 20 ) and consists of ATAC-seq (bulk 20 or single nuclei 17 ) and ChIP-seq 16 (H3K4me3 and H3K27ac) from diverse brain cell types. For each dataset, we defined cell-type-specific epigenomic marks as epigenomic marks observed in a single cell type. Epigenomic regions were lifted over to hg38 (if not provided for this genome build by the authors). We repeated the enrichment analysis (fdensity) for neuronal cis-eQTLs using different filters. None of the filters (listed below) showed an enrichment of neuron-specific regulatory elements around neuron cis-eQTLs: (1) restricting to most significant eQTLs in neurons (0.1% FDR); (2) eQTLs detected in neurons with 20 PCs (instead of 70) (5% FDR); (3) eQTLs for genes expressed in at least 20% of the nuclei; (4) eQTLs only significant in neurons (5% FDR; (5) cell-type-specific eQTLs using our negative binomial model (5% FDR); and (6) eQTLs obtained using stringent quality control criteria at nuclei level (at least 1,200 UMIs or at least 1,500 UMIs).
Interaction model. We tested whether the effect sizes of our cis-eQTLs (n = 7,607 at 5% FDR) were cell type specific using a negative binomial mixed model (as implemented in the R package glmmTMB (version 1.1.2.3) 66 ) in three different ways. First, we tested the hypothesis that the effect size of each of the 7,607 eQTLs in the discovery cell type (for example, excitatory neuron) was different from the 'average effect size in the other 7 cell types' . To test this hypothesis, we used the following model: raw counts (each gene) ~ genotype (0,1,2) + cell_ type + genotype:cell_type_eqtl (cell type in which the eQTL was discovered, coded as 0,1) + PC 1-3 genotype matrix + PC 1-5 expression matrix + study + disease_ status + (1 | individual) (random effect), using as offset the log of the TMM-normalized library size 56 . Coding the interaction term with ':' prevents R from expanding the interaction term (that is, with genotype × cell_type_eqtl, R would automatically expand the interaction term to genotype + cell_type_ eqtl + genotype × cell_type_eqtl). We think that our model is a better fit to the data than a model with the interaction term coded as 'genotype × cell_type_eqtl' . Both models, 'genotype:cell_type_eqtl model' and 'genotype × cell_type_eqtl' , give one interaction P value for each gene. However, the model 'genotype:cell_type_eqtl' correctly accounts for the average expression of the gene in each of the seven individual cell types (due to the covariate 'cell_type'), whereas the model 'genotype × cell_type_eqtl' would estimate a single average expression value for all nuclei that do not belong to the discovery cell type (as the automatically expanded covariate would be 'cell_type_eqtl' , which is a binary variable). The P values obtained for the interaction term (one per discovered eQTL, from the 'genotype:cell_type_eqtl' term) were then corrected for multiple testing using the Benjamini-Hochberg procedure 65 (n = 7,607 P values).
Second, we tested the hypothesis that the effect size of each of the 7,607 eQTLs in the discovery cell type was different in 'at least one of the other 7 cell types' . Here, we used a similar model as described above, except that the specified interaction term was 'genotype × cell_type' instead of 'genotype:cell_type_eqtl' . Using the 'genotype × cell_type' model, we get seven interaction P values, one for each cell type, highlighting the difference in eQTL effect size in each cell type compared to the reference cell type in which the eQTL was discovered. For each gene, we used Bonferroni correction across the seven interaction P values and retained the minimum corrected P value. Finally, the Bonferroni-corrected minimum P values of each gene were corrected for multiple testing using the Benjamini-Hochberg procedure across all 7,607 eQTLs 65 . The minimum gene P values that survive multiple testing corrections provide evidence that the eQTL effect size in the discovery cell type was significantly different in 'at least one of the other 7 cell types' .
Third, we tested the hypothesis that the effect size of each of the 7,607 eQTLs in the discovery cell type was different from 'all other 7 cell types' . We used the same model, 'genotype × cell_type' as described above, but, instead of retaining the minimum corrected P value, we retained the maximum P value from the seven interaction P values (for each gene). These P values were then corrected for multiple testing with the Benjamini-Hochberg procedure across all tests 65 (n = 7,607 P values). The maximum gene P values that survive multiple testing corrections provide evidence that the eQTL effect size in the discovery cell type was significantly different from 'all other 7 cell types' .
Fine-mapping of cis-eQTLs. We used CaVEMaN (version 1.01) 67 to fine-map our cis-eQTLs. In brief, CaVEMaN performs 10,000 eQTL analyses after bootstrapping the data (that is, new datasets are created with randomly picked individuals with replacement). CaVEMaN then records the proportion of times that a given SNP is ranked among the top ten most associated SNPs. These proportions are used to compute the CaVEMaN score, which is then calibrated to estimate the causal probability for each SNP.
GWAS summary statistics. We obtained publicly available GWAS summary statistics for AD 4 , MS 25 and SCZ 24 . For PD, we performed an inverse variance-weighted meta-analysis 68 using summary statistics from Nalls et al. 22 and Nalls et al. 23 .
Gene set genetic enrichment. We used the normalized (CPM) pseudo-bulk gene expression matrix from control individuals from our MS_Roche dataset for this analysis (n = 83). For each gene, we computed the average CPM across samples, resulting in one expression value for each gene in each cell type. We retained only genes with a mean CPM value greater than 1 and genes for which we had a gene-level genetic enrichment estimate from MAGMA (version 1.08) 69 . We then computed the proportion of the total expression of each gene across the different cell types (that is, mean_cpm / sum(mean_cpm)). This captures how specific a gene is to a given cell type (for example, 92% of AQP4 expression was observed in astrocytes). Finally, we tested the top 1,000 most specific genes (for each cell type) for enrichment in GWAS associations using MAGMA (version 1.08) 69 .

Co-localization.
For the AD 4 , PD 22,23 and MS 25 GWASs, we defined loci as the coordinates of the most extreme SNPs in LD (r 2 > 0.1 in 1000 Genomes 60 European individual) with the reported index SNPs using LDlinkR (version 1.1.2) 70 . For the SCZ GWAS 24 , we used the reported loci coordinates (r 2 > 0.1 with index SNP). For each loci, we then tested co-localization between the GWAS and eQTL signals for genes with at least ten SNPs using the 'coloc.abf ' function of the Coloc R package (version 5.1.0) 3 (with default prior, using beta coefficients from the GWAS and eQTL analysis and setting the sdY parameter to 1 for the brain and GTEx eQTL datasets (as gene expression data were inverse normalized)). For DICE 38 , we used the MAF of our eQTL study to estimate sdY (in addition to the sample size of the DICE study). Allelic directions were derived from the effect sizes of the SNP with the most significant eQTL P value.