To identify new genetic risk factors for cervical cancer, we conducted a genome-wide association study in the Han Chinese population. The initial discovery set included 1,364 individuals with cervical cancer (cases) and 3,028 female controls, and we selected a ‘stringently matched samples’ subset (829 cases and 990 controls) from the discovery set on the basis of principal component analysis; the follow-up stages included two independent sample sets (1,824 cases and 3,808 controls for follow-up 1 and 2,343 cases and 3,388 controls for follow-up 2). We identified strong evidence of associations between cervical cancer and two new loci: 4q12 (rs13117307, Pcombined, stringently matched= 9.69 × 10-9, per-allele odds ratio (OR)stringently matched = 1.26) and 17q12 (rs8067378, Pcombined, stringently matche = 2.00 × 10-8, per-allele ORstringently matched = 1.18). We additionally replicated an association between HLA-DPB1 and HLA-DPB2 (HLA-DPB1/2) at 6p21.32 and cervical cancer (rs4282438, Pcombined, stringently matched =4.52×10-27, per-allele ORstringently matched= 0.75). Our findings provide new insights into the genetic etiology of cervical cancer.
Cervical cancer is the third leading cause of cancer-related mortality among women worldwide, and approximately 80% of the diagnoses of cervical cancer have occurred in the developing world. Abundant epidemiological and clinical evidence indicates that persistent infection with high-risk human papillomavirus (HPV) is the major risk factor and is a requirement for the development of cervical cancer; HPV infection is detected in 99.7% of cervical cancer cases. Cervical cancer has therefore been traditionally recognized by the World Health Organization as entirely attributable to HPV infection.
Nevertheless, high-risk HPV infection alone has been found to be insufficient to induce tumor progression. HPV infection is common enough that a majority of sexually active women have been infected more than once during their lifetimes. However, most infections are transient and cleared spontaneously by the immune response; even persistent infections may clear, and premalignant lesions can also regress. Moreover, the majority of infected women never develop cancer. Fewer than 4% of individuals infected with HPV develop persistent infections and premalignant lesions (cervical intraepithelial neoplasia (CIN)) and even fewer develop invasive cancer, indicating a complex relationship between host genetics and the virus.
Cervical cancer is caused by a combination of genetic heritability and definite external environmental contributions. In Swedish studies, genetic heritability was shown to account for 27% of the effects of factors underlying cervical cancer development, and the estimate of heritability for cervical cancer was substantially higher than those for colorectal and lung cancer. Therefore, efforts to identify the genetic risk factors of cervical cancer are of great importance, as they will contribute to an overall etiological understanding and provide general insight into host-virus interactions. Previously, common variants in the MHC region were identified as being associated with cervical cancer in Swedish populations, but there was not enough independent validation of these positive loci. How genetic susceptibility is linked mechanistically to the progression of cervical cancer is still poorly understood. Therefore, further investigations are needed to understand the possible mechanisms of genetic susceptibility to cervical cancer.
We conducted a genome-wide association study (GWAS) of cervical cancer in the Han Chinese population. The initial discovery set for the GWAS included 1,364 cases and 3,028 female controls, and from this initial set we selected a stringently matched samples subset (829 cases and 990 controls) on the basis of principal components analysis (PCA; Online Methods). The follow-up stages involved two independent sample sets (1,824 cases and 3,808 controls for follow-up 1 and 2,343 cases and 3,388 controls for follow-up 2). We also collected 729 samples from individuals with CIN. The characteristics of the samples are listed in Table 1, and additional details are provided in the Online Methods.
In the discovery stage, we genotyped 1,374 cases and 3,135 controls using the Affymetrix Axiom Genome-Wide CHB1 Array. After standard quality control (Online Methods), we subjected a total of 563,339 SNPs in 1,364 cases and 3,028 controls to statistical analysis. We used PCA to evaluate the population structure (Supplementary Figs. 1 and 2a). To minimize the potential for population stratification bias, we matched the samples on the basis of the PCA (Online Methods). We generated the stringently matched sample set according to a strict criterion (Online Methods and Supplementary Fig. 2b). The quantile-quantile plots showed some evidence for inflation due to population stratification (genomic inflation factor (λ) = 1.066 and λ standardized to a sample set of 1,000 (λ1,000) = 1.035 for the initial discovery sample set; λ = 1.023 for the stringently matched samples; Online Methods and Supplementary Fig. 3). We performed a GWAS analysis using logistic regression with PCA-based correction for both the initial discovery sample set and the stringently matched samples (Online Methods); we selected SNPs that were consistently significant in both sample sets (P≤5 × 10-5 in the initial discovery set and P≤10-4 in the matched samples; Online Methods) for follow-up 1 (Fig. 1 and Supplementary Figs. 4 and 5).
Figure 1 Genome-wide association results of cervical cancer in Han Chinese individuals. Scatter plot of P values on the –log10 scale for 563,339 SNPs in the matched discovery set (1,305 cases and 1,444 controls; Online Methods). The red line represents P = 5.0 × 10-8, and the blue line represents P = 1.0 × 10-4. Chr, chromosome.
In total, 41 SNPs showed consistently significant associations in the analysis of the discovery-stage sets (Online Methods). Among these SNPs, we selected 22 representative SNPs for follow-up 1 and ignored the other 19 because of high linkage disequilibrium (LD; r2 ≥ 0.8) with at least 1 of the 22 representative SNPs (Online Methods and Supplementary Table 1). In follow-up 1, 13 of the 22 SNPs were nominally statistically significant in 1,824 cases and 3,808 controls, and we then performed a meta-analysis to combine the results from the northern Han, central Han and southern Han Chinese data sets (Pmeta-analysis, follow-up 1< 5.0 × 10-2; Online Methods and Supplementary Table 2). Thus, we genotyped these 13 SNPs in follow-up 2 with another 2,343 cases and 3,388 controls (Supplementary Table 3). By combining the results from all three stages, we achieved genome-wide significant associations (P < 5.0 × 10-8) for 11 SNPs, including 1 SNP at 4q12, 1 SNP at 17q12 and 9 SNPs at 6p21.32 (Supplementary Tables 4 and 5).
The association of rs13117307 (Pcombined, stringently matched=9.69×10-9, per-allele ORstringently matched = 1.26) in 4q12 was in an intronic region of EXOC1 (Table 2, Fig. 2a and Supplementary Table 6). Controlling for rs13117307, stepwise logistic regression analysis revealed that there were no additional association signals in this region (Supplementary Table 7). The protein product of EXCO1 combines with seven other subunits (the products of EXCO2–EXCO8) to make up the exocyst complex, which facilitates the regulated exocytosis of membrane activity, vesicle transport machinery and cellular migration and secretion. The exocyst complex is also associated with the host innate immune response against DNA antigens of viral infection. Several lines of evidence have suggested that the CD8+ T cell–mediated immune response is important in HPV infection and virus-associated neoplasia. Association of the exocyst complex with the NEF protein probably has an important role in downregulating the gene encoding MHC-I and modulating T-cell signaling pathways. It is probable that the exocyst complex proteins are key effectors of NEF-mediated enhancement of nanotube formation and microvesicle secretion. Moreover, fusion of a NEF mutation with the HPV type-16 protein E7 induces an anti-E7 CD8+ cytotoxic T-lymphocyte response that correlates with protection against HPV-related tumors. The interruption of the balance between the exocyst complex and T-cell signaling pathways may be important for the progression of cervical cancer.
Figure 2 Regional plots of the four identified marker SNPs. (a–d) Plots for rs13117307 at 4q12 (a), rs8067378 at 17q12 (b) and rs4282438 (c) and rs9277952 (d), both at 6p21.32. Results (-log10 P) are shown for SNPs in the regions flanking 150 kb on either side of the marker SNPs. The marker SNPs are shown in purple, and the r2 values for the rest of the SNPs are shown in different colors. The genes within the region of interest are annotated and indicated by arrows. The association results of both genotyped (circles) and imputed (Xs) SNPs (Online Method) in the matched discovery set (1,305 cases and 1,444 controls) and the combined results of four loci in the stringently matched subset are also shown.
The most significant SNP in 17q12, rs8067378 (Pcombined, stringently matched = 2.00 × 10-8, per-allele ORstringently matched = 1.18), is located 9.5 kb downstream of GSDMB (Table 2, Fig. 2b and Supplementary Table 6). Controlling for rs8067378, stepwise logistic regression analysis revealed no additional association signals (Supplementary Table 7). It has been previously reported that human GSDM-family genes may be involved in cancer development and progression. Among these gene family members, GSDMB (encoding the cancer-associated gasdermin-like protein (GSDML)) is expressed in human cancer tissues, including gastric, hepatic and cervical cancers. GSDML is expressed in the nuclei at higher levels in cervical cancer than in adjacent cancer and corresponding non-neoplastic tissues. Moreover, ectopic expression of GSDML increased the growth of cervical cancer cells in vitro, whereas inhibition of its endogenous expression decreased cell proliferation, suggesting that GSDML can promote the proliferation of cervical cancer cells and may be correlated with the development of cervical cancer. Furthermore, GSDM, which is regulated by TGF-β signaling, shows apoptotic activity and is expressed in the pit cells of human epithelium tissue, suggesting that GSDM and TGF-β signaling form a regulatory pathway that directs noncancerous cells to apoptose.
On 6p21.32, within the MHC region, nine significant SNPs (4.52 × 10-27 < Pcombined, stringently matched < 2.31 × 10-9) are located within a 180-kb region that includes HLA-DPB1/2 and HLA-DPA1. The HLA-DPs belong to the HLA class-II molecules that form heterodimers on the cell surface and present antigens to CD4+ T lymphocytes. HLA-DPs are highly polymorphic, especially in exon 2, which encodes antigen-binding sites. Stepwise logistic regression identified two independent associations at rs4282438 (Pcombined, stringently matched = 4.52 × 10-27, per-allele ORstringently matched = 0.74; Table 2 and Fig. 2c) and rs9277952 (Pcombined, stringently matched = 2.31 × 10-9, per-allele ORstringently matched = 0.85; Table 2 and Fig. 2d). After conditioning on these two SNPs, the rest of the SNPs in this region showed no significance (P > 0.05; Supplementary Table 7). We investigated the pairwise LD between these two SNPs and tagging SNPs for HLA alleles (HLA-A, HLA-B, HLA-C, HLA-DQ and HLA-DR) in the HapMap CHB population (Online Methods and Supplementary Table 8) and found that rs4282438 was in strong LD with one tag SNP (rs6937034) for HLA-DQB*0402 (r2 = 0.924) and rs9277952 was in moderate LD with tag SNPs for HLA-DQB*0402 (rs6937034) and HLA-DRB*0410 (rs3130267; r2 > 0.2 for both). As researchers in a previous study did not include the HLA-DP genes in their investigation, we genotyped HLA-DPA1 and HLA-DPB1 alleles by directly sequencing exon 2 (Online Methods). Our analysis revealed that HLA-DPA1*0103, HLA-DPA1*0401, HLA-DPB1*03:01 and HLA-DPB1*04:01 were associated with susceptibility to cervical cancer (P = 2.72 × 10-3, OR = 1.18; P = 6.35 × 10-4, OR = 1.78; P = 2.91 × 10-2, OR = 1.29; and P = 9.57 × 10-3, OR = 1.29, respectively), whereas HLA-DPA1*0202 and HLA-DPB1*05:01 showed protective effects (P = 8.01 × 10-6, OR = 0.793; and P = 2.38 × 10-9, OR = 0.714, respectively; Supplementary Table 9). These findings indicate that HLA alleles are probably associated with the tumorigenesis of cervical cancer.
The expression of the candidate genes in cases and controls is listed in Supplementary Table 10, and the results of expression quantitative trait locus (eQTL) analysis between cervical cancer susceptibility alleles and the expression levels of the candidate genes are shown in Supplementary Table 11.
By genotyping the additional 729 samples from individuals with CIN, we found that the 11 genome wide–significant SNPs in 4q12, 17q12 and 6p21.32 were also significantly associated with CIN, as well as CIN plus cervical cancer (P < 0.05; Supplementary Table 12), probably suggesting that those regions confer similar risk to patients with CIN as compared to patients with cervical cancer.
Additionally, several previous reports have described the association of a series of candidate genes with the progression of cervical cancer (Supplementary Table 13), including HLA-DQB1, HLA-DRB1, HLA-DPB1, TP53, TNFA and FASL, among others. Some of these genes are regarded as essential factors of carcinogenesis. Of 1,445 variants identified in these previous studies, we found that 103 SNPs had PCA-adjusted P < 0.05. However, none of these results has ever indicated a possible involvement of the susceptibility loci at 4q12 and 17q12 (Supplementary Table 13).
In summary, our GWAS of cervical cancer in the Han Chinese population identified two new cervical cancer susceptibility loci at 4q12 and 17q12. We also confirmed the previously reported association between susceptibility loci at 6p21.32 and cervical cancer. The identification of susceptibility loci in the EXOC1 and GSDMB regions, as well as in the HLA-DP alleles, suggests an essential role for T cell–mediated immune responses or tumor-cell proliferation, strengthening the hypothesis that inherited immunological and cinogenic factors are prominent in determining the risk for cervical cancer, probably by affecting the mechanisms involved in the persistent infection and integration of HPV.