Promoter polymorphisms in the chitinase 3-like 1 gene influence the serum concentration of YKL-40 in Danish patients with rheumatoid arthritis and in healthy subjects

Introduction The present study investigates the association between single nucleotide polymorphisms (SNPs) in the chitinase 3-like 1 (CHI3L1) gene and serum concentrations of YKL-40 in Danish patients with rheumatoid arthritis (RA) and healthy controls as well as the association with RA in the Danish population. The CHI3L1 gene is located on chromosome 1q32.1 and encodes the YKL-40 glycoprotein. YKL-40 concentrations are elevated in the serum of patients with RA compared to healthy subjects, and YKL-40 has been suggested to be an auto-antigen and may play a role in development of RA and in inflammation. Methods Eight SNPs in the CHI3L1 gene and promotor were genotyped in 308 patients with RA and 605 controls (healthy blood donors) using TaqMan allele discrimination assays. Serum concentrations of YKL-40 were determined by an enzyme-linked immunosorbent assay (ELISA). Results We found significant association between the serum concentrations of YKL-40 and polymorphism in the CHI3L1 gene among both patients with RA and controls. The g.-131(C > G) polymorphism (rs4950928) was most strongly associated with age adjusted serum concentrations of YKL-40 in patients with RA (P < 2.4e-8) and controls (P < 2.2e-16). No significant allelic- or genotypic association with RA was found in this Danish cohort. Conclusions We suggest that the g.-131(C > G) promoter polymorphism has a substantial impact on serum concentrations of YKL-40 in patients with RA and healthy subjects. However, the polymorphism does not seem to confer risk to RA itself. The effect of CHI3L1 polymorphism on clinical outcome or the response to treatment in patients with RA remains to be investigated.

A high serum concentration of YKL-40 is emerging as a new biomarker of severe disease activity and poor prognosis in patients with diseases characterized by inflammation and ongoing tissue remodelling such as RA, inflammatory bowel disease, asthma and cancer [8,10,[16][17][18][19][20][21][22][23][24][25][26]. The exact biological function of the YKL-40 protein is still largely elusive. YKL-40 is a transmembrane protein in which cleavaged components bind to an unidentified receptor and the expression of YKL-40 is regulated by various inflammatory cytokines and hormones [27][28][29][30]. It is suggested that YKL-40 plays a role in cell proliferation, differentiation and protection against apoptotic signals, and has an effect on extracellular tissue remodelling [31,32]. Two recent studies have explored the effect of YKL-40 as a stimulator of angiogenesis in tumours, suggesting that anti-YKL-40 antibodies could have a place in cancer treatment [33,34].
The proximal promoter region of the CHI3L1 gene contains a highly polymorphic area, suggesting a possibility for several functional variants of the gene. Rehli et al. [35] demonstrated that binding of the SP1 transcription factor to the most proximal part of the CHI3L1 gene affected gene transcription. This finding was supported by Zhao et al. [36] reporting functional variants based on the binding of the MYC/MAX transcription factors to the proximal promoter region. The relationships between CHI3L1 polymorphisms and YKL-40 production have been studied in a small number of patients with various inflammatory disorders, such as sarcoidosis, asthma, hepatitis, schizophrenia and diabetes [37][38][39][40][41][42][43][44]. These studies suggest that serum concentrations of YKL-40 are, at least partly, regulated by polymorphisms in the proximal promotor region. The findings have been somewhat contradictory and the exact position of the regulatory site or sites remains to be demonstrated. Allele frequencies differ significantly between Caucasian, African and Asian populations, and possibly even within these populations, thereby making direct comparison of the reported studies difficult [45].
Only one small study has evaluated CHI3L1 polymorphisms in patients with RA [46]. In 182 Hungarian patients with RA and 194 healthy controls there were no significant differences in genotype frequencies for the g.-131(C > G) or the g.-329(C > T) polymorphisms between the two groups. This study did not evaluate the functional properties of these polymorphisms. Several questions remain unanswered, namely the relationship between CHI3L1 polymorphisms and serum concentrations of YKL-40 in patients with RA, the association of CHI3L1 promoter genotypes to risk of RA and the Linkage Disequilibrium (LD) properties in different populations.
We aimed to investigate these questions in a cohort of well defined Danish patients with RA and a group of healthy Danish controls. Our hypothesis was that polymorphisms in the proximal promoter region of CHI3L1, most likely the g.-131(C > G) polymorphism (rs4950928), are associated with serum concentrations of YKL-40 in both patients with RA and healthy controls. Moreover, we hypothesized that these polymorphisms could be associated with the risk of developing RA and possibly also associated to IgM rheumatoid factor (RF), since YKL-40 seems to play a role in the pathogenesis and immunomodulation in RA.

Patients with rheumatoid arthritis
Three-hundred and eight patients with RA treated at the Department of Rheumatology, Hvidovre Hospital, Hvidovre, Denmark were included in the study. The patients had RA according to the ACR 1987 criteria [47]. The patients with available blood samples were identified in the DANBIO Registry (The Copenhagen Cohort). DANBIO is a Danish nationwide registry that prospectively collects clinical data on patients with rheumatic diseases receiving medical treatment [48]. The blood samples (serum and whole blood) were collected at the time of diagnosis or at the time of starting treatment with TNFα inhibitors. All patients provided informed consents for inclusion in the study population. The study was approved by the local ethics committee. Table 1 summarizes the demographic data for the patients with RA and the controls.

Healthy controls
Six-hundred and five healthy blood donors from the Aalborg Hospital Blood Bank, Aalborg, Denmark were included in the study. The donors were known not to take any medication and were clinically healthy at the time of blood drawing. The over-2representation of female controls was a random phenomenon. The samples were handled anonymously and all donors gave consent to the blood being used for this purpose and the sampling was approved by the local ethics committee.

Handling of blood samples
From the patients with RA and blood donors Ethylenediaminetetraacetic acid (EDTA)-stabilised whole blood and blood samples without anticoagulants were drawn. Serum was isolated from coagulated whole blood within three hours and stored at -80°C until analysis of YKL-40 and IgM-RF was performed. Genomic DNA was prepared from EDTA-stabilised blood samples using a Maxwell 16 blood DNA purification kit (Promega, Madison, WI, USA).

Biochemical analysis
Serum concentration of YKL-40 was measured by a commercial two-site sandwich type ELISA (Quidel, Mountain View, CA, USA) [49]. The detection limit was 10 ng/ml. The intra-assay coefficient of variations (CV) was 5% and the inter-assay CV was < 6%. IgM-RF was measured using an ELIA fluorescence immunoassay on a Unicap250 system (Phadia AB, Uppsala, Sweden). A validated diagnostic cut off (< 17 kI U/l) was used to classify patients as IgM-RF negative or IgM-RF positive.

Genotyping
A total of eight SNPs located within the promoter or coding regions of the CHI3L1 gene was analysed. Genotyping was performed using real-time polymerase chain reaction (rt-PCR) with TaqMan

Statistical analysis
The genotype distribution among patients with RA and controls was tested for deviation from Hardy-Weinberg equilibrium and haplotypes were estimated using the Helix Tree SNP analysis software package (Golden Helix Software, Bozeman, MT, USA). The degree of LD between the SNPs was determined using the SHEsis software (Bio-X Center, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai 200030, China) [50]. Serum concentrations of YKL-40 were log-normally distributed and, therefore, log-transformed before analysis. Statistical analysis was performed using the statistical software system R, version 2.12.1 [51]. The initial nonlinear association between serum concentrations of YKL-40 and age was modelled by a restricted cubic spline function, using the user-contributed package design [52] integrated in R. Analysis of variance based on multiple linear regression models was used to investigate the association between age, gender, case-control status, genotypes and serum YKL-40. Prior to SNP-wise association analysis with serum YKL-40, all serum concentrations of YKL-40 were age adjusted to 44.4 years (mean age for the total sample of controls and cases age 65 years and below) using a linear model. Genotypic associations with age-adjusted serum concentrations of YKL-40 were carried out for cases (age 65 years and below) and controls separately using a multiple linear regression model. For association analysis with RA, allelic and genotypic association was performed using Fisher:s exact test including all patients (n = 308) and controls (n = 605) and using a significance level of 0.05.

Results
No deviations from Hardy-Weinberg equilibrium were found for any of the eight SNPs in the patient or control group. Age stratification into one-year age groups did not reveal deviations from Hardy-Weinberg equilibrium in any of the age groups.
Prior to the SNP association analysis, the effect of age and case-control status on serum YKL-40 was tested using a multiple linear regression model, with serum YKL-40 as dependent variable and case-control status and a non-linear function of age included as covariate. Strong significant association of the serum concentration of YKL-40 with age (P < 2.0e-16) and case-control status (P < 2.0e-16) was observed ( Figure 1). Moreover an apparent increase in serum YKL-40 with age was found for the older patients in the case group. To avoid a potential bias due to the high influence of individuals older than 65 years in the RA group, we excluded in all further analysis patients with RA older than 65 years.
To test the effect of genotypes on serum concentrations of YKL-40 in the RA group (age 65 and below) and control group, a multiple linear regression model including serum YKL-40 as dependent variable and a non-linear function of age, case-control status, genotypes and gender as well as the interaction between case-control status and genotype with age as independent variables was applied ( Table 2). From this analysis a strong association was observed with case-control status (P < 2.0e-16) (as before), age (P < 2.0e-16) (as before) and genotype (P < 2.0e-16).
Regarding the age-dependent increase in the serum concentrations of YKL-40, no significant difference was found between a non-linear and a linear model for the age-dependence in both the case group (age 65 and below) and control group (P = 0. 19) suggesting that the linear model can be used for age adjustment of the serum concentrations of YKL-40 in both groups. The linear model was fitted and depicted in Figure 2.
Serum concentrations of YKL-40 were not associated with gender (P = 0.16). There were no interaction effects between case-control status or genotype and age (P = 0.89) and no association between serum YKL-40 and the interaction effect between genotype and casecontrol status (P = 0.16) ( Table 2). This suggests that age, case-control status and genotypes are all strong independent factors affecting serum concentrations of YKL-40. To test the association of each SNP on age-adjusted serum YKL-40 in the RA group (age 65 and below) and control group, a linear age-adjustment was applied and genotypes were included one-by-one as dependent variables in a multiple linear regression analysis. The g.-131 (C > G) genotype was found to be most strongly associated with age-adjusted serum concentrations of YKL-40 in both the patient (P = 2.4e-08) and control group (P < 2.2e-16) ( Table 3). Consistently within both groups, the rare GG genotype was associated with low serum YKL-40, the CG genotype with intermediate serum concentrations of YKL-40, and the common CC genotype with high serum YKL-40 ( Figure 3). With respect to genotypes, the RA patients had significantly higher serum YKL-40 than controls for both the CC and CG group. For the rare GG group, the difference was not   significant, most likely because of low statistical power due to the limited number of individuals in the GG groups. When the g.-131 C/G was used as a covariate to determine the influence of the remaining seven SNPs on serum concentrations of YKL-40 none of the other SNPs contributed significantly to the association supporting the isolated highly significant effect of the g.-131 C/G polymorphism on serum concentrations of YKL-40 (Table 4).
Haplotype analysis did not add further information as all the haplotypes associated with low serum concentrations of YKL-40 carried the g.-131G allele and no further increase in association was seen with any of the haplotypes (data not shown). LD analysis of the eight genotyped SNPs revealed that both the proximal promoter and the distal part of the gene contained blocks of high or moderate LD (Figure 4) explaining the effect of all the included polymorphisms on serum YKL-40 when analysed individually. In particular the -131 C/ G polymorphism displayed moderate LD with g.-329C/T (R 2 0.78) indicating that the effect on serum concentrations of YKL-40 with g.-329C/T is caused by LD. These findings are in line with CEU HapMap data ( Figure 5).
To investigate the association of the eight SNPs with case-control status, allelic and genotypes were tested for association with RA using Fishers exact test. No association was found with alleles or genotypes for any of the eight SNPs (Table 5) indicating that these SNPs do not confer risk to the development of RA itself.
The high producer genotypes were not more frequent in the IgM-RF positive subgroup and no difference was found in geno-or phenotype distribution between seropositive and seronegative patients with RA (data not shown).

Discussion
This study aimed to investigate eight polymorphic sites in the CHI3L1 gene with possible functional properties in both patients with RA and healthy individuals. We focused on the g.-131(C > G) allele and closely related polymorphisms described in Caucasian populations [26,[36][37][38][39]43,44,46]. The g.1219(G > A) polymorphism was also included as one study reported an individual functional property of this polymorphism [43]. Serum concentrations of YKL-40 were strongly associated with age and case-control status. After adjustment of the serum concentrations of YKL-40 for these two variables, serum YKL-40 was found to be significantly associated with SNPs in the CHI3L1 gene. The strongest Several other studies have suggested the g.-131(C > G) is a strong candidate for a functional promoter polymorphism influencing the serum concentrations of YKL-40 [36,[42][43][44][45]. The promoter SNP g.-131(C > G) in the CHI3L1 gene was associated with elevated serum YKL-40, asthma, bronchial hyper responsiveness and pulmonary function [44,45], and with elevated serum YKL-40 and the severity of hepatitis C virus-induced liver fibrosis [43]. This indicates a functional role of YKL-40 in these diseases. An association is also found between schizophrenia and haplotypes within the promoter region of the CHI3L1 gene suggesting that polymorphisms in an area starting from base pair position -180 could have functional properties [36,42]. Our findings support these earlier studies.
Zhao et al. [36] investigated Chinese patients with schizophrenia and found lower activity of the transcription factor MYC/MAX and decreased CHI3L1 gene expression related to the low frequency G allele for the g.-131(C > G) SNP. Ober et al. [44] studied 443 patients with asthma and 491 healthy controls from a genetically     children and 180 healthy controls from a Korean population. They concluded that this polymorphism was responsible for most of the genetic effects on YKL-40 production, and that the g.-131C allele was associated to low promoter activity. These results are complicated by the fact that the g.-131(C > G) and g.-247(G > A) polymorphisms showed no LD in the Asian populations, contradictory to our finding which suggest a high degree of LD in this part of the proximal promoter. In the Danish population the region on chromosome 1 bearing the g.
-131(C > G) polymorphism was in strong LD, illustrated by the occurrence of just 8 frequent haplotypes (f > 1%). The g.-131(G > C) allele was found to be in LD with several other loci in the CHI3L1 gene, and three haplotypes could be defined as low producer haplotypes, all including the g.-131G allele. It must be emphasized that ethnicity seems to play an important role in the genetic regulation of YKL-40 production, and results from our and similar studies can only be considered valid in ethnically similar populations. Further studies in different populations are awaited. Serum concentrations of YKL-40 increased with increasing age in both healthy controls and patients with RA making age a possible confounding variable. The cause of this remains unknown, but the phenomena has been explained by a higher level of general inflammation and apoptosis in the elderly, which is well known for other inflammatory mediators [53]. Similar increases in plasma YKL-40 with age have recently been described in a large group of 8,899 subjects from the general Danish population [20]. We initially decided to fit a non-linear model to explain the effect of age on serum concentrations of YKL-40. Our control group did not include any persons above the age of 65, but below this age we were able to fit a linear model explaining the relationship between age and serum concentrations of YKL-40. This supports the findings by Kruit et al. [38], who suggested a linear relationship between age and serum YKL-40. In the patients with RA it seems as if serum concentrations of YKL-40 rise more rapidly with age above age 65, indicating that elevated serum YKL-40 in this age group needs careful interpretation. It is possible that high serum YKL-40 is associated to comorbidity or a latent malignant disease [10,[20][21][22][23][24][25][26]53,54] It remains unknown whether high serum YKL-40 affects a person:s risk of autoimmune disease in the long term. YKL-40 expression is stimulated by the inflammatory cytokines TNF-α, IL-6 [30] and IL-1β, whereas YKL-40 inhibits cellular responses induced by IL-1 and TNF-α, suggesting an autocrine feed-back mechanism [9,28]. YKL-40 is strongly expressed by macrophages in the synovial membrane of RA patients possibly activated by a pro-inflammatory IFNγ-mediated immune response, and elevated YKL-40 can stimulate local production of anti-inflammatory IL-10 [32]. In inflammatory diseases such as RA, the excessive YKL-40 production may also have the opposite effect stimulating a continuous pro-inflammatory state and stimulation of VEGF and angiogenesis [32][33][34].

Conclusions
In conclusion, this study reports a strong association between the g.-131(C > G) allele and serum concentrations of YKL-40 in both patients with RA and healthy controls. Our findings indicate that the g.-131(C > G) polymorphism is the main contributor to the inter-individual variation of serum YKL-40 in Caucasian patients with RA, and that the effect of other polymorphic sites in this region is related to a high degree of LD in this area of the genome.