In the CGC application presented in this paper, all known rat genes within a selected QTL, along with all human genes within the homologous interval, are retrieved and displayed from a table that has the same name as the selected QTL. A list with an array of 49 selectable arthritis related keywords is presented together with their respective keyword values. Up to 10 additional keywords can be added and their keyword values are automatically calculated. When performing a search, the textual information for each human gene stored in the table 'OMIMdata' is scanned for all selected keywords. The genes and all keywords found in the accompanying text are displayed, together with the sum of all matching keyword values.
To estimate whether the CGC application was able to rank candidate genes in fashion similar to human evaluations, gene descriptions for four randomly selected QTL regions (Cia4, Cia10, Cia14 and Cia17) were surveyed manually. For all genes within the selected QTL regions, we compared the outcome of the CGC gene ranking with our own manual evaluation of each OMIM text. The manual rating was made without knowledge of the CGC ranking. To put the application and the manual inspection at a similar level, we tried to base our evaluation on the written OMIM texts only, without taking other information into account. In the manual inspection the OMIM texts were divided into five different classes: (1) obvious gene candidate, (2) likely gene candidate, (3) possible gene candidate, (4) unlikely gene candidate and (5) gene without relevance.
In addition, the genes that were ranked as high by the CGC application were further scrutinised in an extensive analysis of related papers not found in the OMIM reference lists. Finally, the NCF1 gene was studied in detail.
Cia4
In total, 12 genes were ranked by the CGC tool. IFNG was rated as the top candidate by the CGC application and it was also considered to be the most appropriate gene candidate for collagen-induced arthritis within this QTL according to the manual inspection. IL22 was considered the next highest gene candidate both by the CGC application and the manual inspection.
IFNG(interferon-γ), CGC points 291.1, CGC ranking 1, manual rating 1
IFNG was identified by the CGC application on the basis of 10 different keywords: 'rheumatoid', 'HLA', 'sjogren', 'T cell', 'mhc', 'lymphocyte', 'antigen', 'cytokine', 'arthritis' and 'infecti'. IFNG has been shown to be closely associated with RA. In a study of 99 patients with RA of different severity, susceptibility to, and severity of, RA was shown to be related to a microsatellite polymorphism within the first intron of the gene encoding interferon-γ [12].
IL22(interleukin-22), CGC points 14.1, CGC ranking 2, manual rating 2
IL22 was selected by the keywords 'inflam', 'T cell', 'lymphocyte' and 'cytokine'. IL22 activates three different STAT genes: STAT1, STAT3 and STAT5 [13]. RA synovial fibroblasts are relatively resistant to apoptosis and exhibit dysregulated growth. Retrovirus-mediated gene transfer of dominant-negative mutant STAT3 genes blocks the endogenous STAT3 expression in synovial fibroblasts from patients with RA, leading to failure of growth in the cell culture and apoptosis [14].
A middle group of two genes was selected with the CGC application: MYC (CGC points 10.9, CGC ranking 3, manual rating 3) and HMGIC (CGC points 10.5, CGC ranking 4, manual rating 4).
Cia10
In total, 35 genes were ranked by the CGC tool. RPL7 and NKFB1 were ranked as the two top candidates by the CGC application. These two genes were also manually considered to be the most appropriate gene candidates for collagen-induced arthritis within this QTL.
NFKB1(nuclear factor κB 1), CGC points 219.7, CGC ranking 1, manual rating 1
The very high point that NFKB1 obtained from the keyword query was in part due to the word 'arthritis' appearing in the corresponding OMIM text. Twelve other keywords were also found to be making a substantial contribution. According to the OMIM record, NFKB1 is a very strong gene candidate because the inappropriate activation of NKFB1 is known to be linked to inflammatory events associated with autoimmune arthritis [15].
RPL7(ribosomal protein L7), CGC points 37.3, CGC ranking 2, manual rating 1
The RPL7 gene was rated second by the CGC application mainly because of the keywords 'autoimmune', 'lupus' and 'erythematosus'. The RPL7 protein is reported to be a major autoantigen in systemic autoimmune arthritis [16].
A middle group of five genes was rated as relatively high by the CGC application: COL6A3 (CGC points 24.2, CGC ranking 3, manual rating 3), CSF1 (CGC points 17.4, CGC ranking 4, manual rating 3), EDG1 (CGC points 12.5, CGC ranking 5, manual rating 5), VCAM1 (CGC points 11.3, CGC ranking 6, manual rating 2) and PAPSS1 (CGC points 9.3, CGC ranking 7, manual rating 3). Among these genes, CSF1 is a possible gene candidate because recent studies have shown that synovial tissue in RA joints secretes CSF1 together with several other cytokines, which increases the osteoclast activity [17]. VCAM1 might also be a potential gene candidate because it is expressed in endothelial cells of the blood vessels, facilitating the adhesion of leucocytes [18]. EDG1 was a false prediction because the term 'HLA' matched an author (Hla T. Maciag T. J Biol Chem 1990;265:9308-13) and the term 'T cell' matched 'mutant cell'.
Cia14
In total, 16 genes were ranked by the CGC tool. The two top ranked genes according to the CGC application (IL15 and HMOX1 ) were also the highest-rated genes in the manual inspection.
IL15(interleukin-15), CGC points 27.3, CGC ranking 1, manual rating 1
IL15 was ranked in first place by the CGC application. In the corresponding OMIM text, IL15 is associated with the keywords 'autoimmun', 'inflam', 'T cell', 'lymphocyte', 'antigen', 'cytokine' and 'infecti', but not 'arthritis'. In a recent paper it was shown that increased serum levels of IL15 are found in patients with long-term RA [19].
HMOX1(haem oxidase 1), CGC points 13.5, CGC ranking 2, manual rating 1
HMOX1 was ranked second by the CGC application with the keywords 'anemia', 'hemolytic', 'inflam' and 'T cell'. HMOX1 has been shown to be involved in the treatment of RA with gold(I)-containing compounds. Gold(I) drugs selectively activate a transcription factor (Nrf2/small Maf heterodimer), which induces the transcription of anti-oxidative stress genes, including HMOX1, and inhibits inflammation [20].
A middle group of four genes were rated as relatively high by the CGC application: ITK (CGC points 9.7, CGC ranking 3, manual rating 2), NFATC3 (CGC points 9.7, CGC ranking 3, manual rating 3), AARS (CGC points 9.2, CGC ranking 5, manual rating 3) and KARS (CGC points 9.2, CGC ranking 5, manual rating 3).
Cia17
In total, 30 genes were ranked by the CGC tool (only one member of the PCDH gene family was included). In the manual inspection, no 'obvious' candidate gene was found. However, four genes were considered to be 'likely' gene candidates. One of these, CD74, also received the highest keyword sum in the CGC application. Another gene among the likely gene candidates, SLC26A2, was ranked second by the CGC application.
CD74, CGC points 27.7, CGC ranking 1, manual rating 3
The CD74 gene was ranked in first place by the CGC application because of results from six different keywords: 'antigen', 'HLA', 'immunoglobulin', 'T cell', 'MHC' and 'inflam'. In a recent paper by Leng and colleagues [21], not present in the OMIM text, CD74 is reported to be required for macrophage migration inhibitory factor (MIF)-induced activation of the extracellular signal-regulated kinase-1/2 mitogen-activated protein kinase cascade, cell proliferation, and prostaglandin E2 production. MIF is an upstream activator of monocytes/macrophages and is centrally involved in the pathogenesis of RA and other inflammatory conditions.
SLC26A2(solute carrier family 26 member 2), CGC points 24.2, CGC ranking 2, manual rating 2
SLC26A2 was associated with the keyword 'joint'. SLC26A2 is an anion transporter responsible for four recessively inherited chondrodysplasias: multiple epiphyseal dysplasia (MED) [22], diastrophic dysplasia (DTD) [23], atelosteogenesis Type II (AO2) [24] and achondrogenesis type IB (ACG1B) [25]. However, although other forms of chondrodysplasias such as progressive pseudorheumatoid chondrodysplasia show symptoms similar to those of RA, no clear link between SLC26A2 and RA can be concluded.
A middle group of four genes were ranked in positions 3 to 6 by the CGC application: NR3C1 (CGC points 16.5, CGC ranking 3, manual rating 2), SPINK5 (CGC points 14.2, CGC ranking 4, manual rating 3), IK (CGC points 14.1, CGC ranking 5, manual rating 3) and CD14 (CGC points 12.8, CGC ranking 6, manual rating 2). Two of these genes might be related to RA. NR3C1 is significantly overexpressed in untreated patients with RA and in several clinical studies of inflammatory conditions, such as RA [26]. CD14 has been reported to be associated with significantly elevated serum levels in patients with RA [27, 28].
NCF1(neutrophilic cytosolic factor 1)
The gene NCF1 is covered by both the Cia12 and Pia4 QTLs and was assigned a total point of 238.9 by the CGC application. This suggests that NCF1 is a strong gene candidate for RA. Indeed, NCF1 has been identified as a gene that has a naturally occurring polymorphism regulating arthritis severity in rats [29]. On looking at the OMIM text for NCF1, it is clear that most of the points come from the part of the text describing these particular findings. To evaluate the ability of the tool to predict genes that are reported to be related to the arthritis phenotype, the OMIM text was used in the form in which it existed before NCF1 was shown to be associated with arthritis; that is, the part of the OMIM text describing the association between NCF1 and arthritis was deleted before running the application. The resulting keyword sum was, as expected, much lower, with a total point of 10.8. However, these points were still sufficient to rank NCF1 as the top candidate of Cia12 and Pia4 . Recently, the gene GUSB was updated at OMIM, resulting in a total point of 30.7.