Epidemiology and genetics of rheumatoid arthritis
© BioMed Central Ltd 2002
Received: 27 February 2002
Accepted: 13 March 2002
Published: 9 May 2002
Skip to main content
© BioMed Central Ltd 2002
Received: 27 February 2002
Accepted: 13 March 2002
Published: 9 May 2002
The prevalence of rheumatoid arthritis (RA) is relatively constant in many populations, at 0.5–1.0%. However, a high prevalence of RA has been reported in the Pima Indians (5.3%) and in the Chippewa Indians (6.8%). In contrast, low occurrences have been reported in populations from China and Japan. These data support a genetic role in disease risk. Studies have so far shown that the familial recurrence risk in RA is small compared with other autoimmune diseases. The main genetic risk factor of RA is the HLA DRB1 alleles, and this has consistently been shown in many populations throughout the world. The strongest susceptibility factor so far has been the HLA DRB1*0404 allele. Tumour necrosis factor alleles have also been linked with RA. However, it is estimated that these genes can explain only 50% of the genetic effect. A number of other non-MHC genes have thus been investigated and linked with RA (e.g. corticotrophin releasing hormone, oestrogen synthase, IFN-γ and other cytokines). Environmental factors have also been studied in relation to RA. Female sex hormones may play a protective role in RA; for example, the use of the oral contraceptive pill and pregnancy are both associated with a decreased risk. However, the postpartum period has been highlighted as a risk period for the development of RA. Furthermore, breastfeeding after a first pregnancy poses the greatest risk. Exposure to infection may act as a trigger for RA, and a number of agents have been implicated (e.g. Epstein–Barr virus, parvovirus and some bacteria such as Proteus and Mycoplasma). However, the epidemiological data so far are inconclusive. There has recently been renewed interest in the link between cigarette smoking and RA, and the data presented so far are consistent with and suggestive of an increased risk.
This chapter reviews recent epidemiological data on the relative contributions of genetic and environmental risk factors for the development of RA. It considers and proposes the direct and indirect evidence to the contribution of various risk factors for disease susceptibility. The quality of the evidence varies and, where appropriate, this is highlighted.
Specifically, native American-Indian populations have the highest recorded occurrence of RA, with a prevalence of 5.3% noted for the Pima Indians  and of 6.8% for the Chippewa Indians . By contrast, there are a number of groups with a very low occurrence. Studies in rural African populations, both in South Africa  and in Nigeria , failed to find any RA cases in studies of 500 and 2000 adults, respectively. Studies in populations from Southeast Asia , including China and Japan [16, 17], have similarly shown very low occurrences (0.2–0.3%).
It is clearly difficult from a review of the descriptive epidemiological data to know whether environmental or genetic effects explain the differences between countries. One handle on this is to consider the occurrence in populations presumed to be of the same genetic origin but living in different environments. Such a situation arises by studying populations that have moved from one environment to another.
There are a few studies addressing this with respect to RA. A low occurrence of disease was found in one study of a Caribbean population of African origin living in Manchester, UK, suggesting that the protection to this group was indeed genetically determined . Similarly, the investigation of a Chinese population living in an urban environment in Hong Kong showed the same consistent low frequency . Recent data have shown, however, that Pakistanis living in England had a higher prevalence than those in Pakistan, but it is not as high as the prevalence in ethnic English populations . In general, the data on the geographical occurrence of RA would support the existence of genetic factors being important and explaining differences in disease risk.
The next stage in accruing evidence for genetic risk is to document an increased occurrence of disease in relatives of probands compared with the background population prevalence, so-called familial recurrence risk. Studies of hospital attendees are subject to bias as there may be a selection process whereby individuals would more probably be referred to hospital if they have an affected family member. Furthermore, several studies rely on family history as elicited by the proband, which again is subject to bias.
Few studies have been performed comparing familial recurrence risk in relatives of cases derived from population samples with those of controls. Indeed, such studies have only shown a modest increased risk [21, 22]. For example, a study from the Norfolk Arthritis Register in England showed only a twofold increased risk . Such an observation does not negate the role of genetic factors, but underscores that their contribution to explain disease susceptibility may be modest. This is important as the familial recurrence risk is a key factor in determining the power of genetic linkage studies within affected family pairs. Indeed, in contrast to other autoimmune diseases such as insulin-dependent diabetes and multiple sclerosis, the familial recurrence risk in RA is certainly smaller, thereby making it harder for studies to identify new genetic factors.
A variant of studies of familial recurrence risk is the comparison of disease risk in the initially unaffected co-twin of monozygotic probands compared with dizygotic probands. The assumption is that the environmental sharing between these different twin pair types is the same and thus any increased disease concordance in the monozygotic twins confirms the genetic effect. It is important in such studies to ensure that it is only like-sexed dizygotic twin pairs that are compared with the monozygotic twin pairs. However, there may be a greater environmental concordance in monozygotic twins due, for example, to psychological and other factors.
Twin studies have consistently showed a fourfold increased concordance in monozygotic twins compared with dizygotic twins . This increased risk, however, is of little value in attempting to quantify the genetic contribution to disease risk. The concordance between twins is dependent on the prevalence of disease. As the population prevalence approaches 100%, the concordance will increase accordingly, independent of the true genetic effect. The appropriate way of quantifying genetic risk is to assess the heritability based on a series of assumptions of environmental sharing and genetic sharing between twin types. Such a study has recently been attempted using data from both Finnish and English twins . The results suggest that approximately 50–60% of the occurrence of disease in the twins is explained by shared genetic effects.
The role of HLA DRB1 alleles as a risk factor of RA has been known for 25 years. Associations between different HLA DRB1 alleles have been demonstrated in several populations across the world [25–31]. Indeed, there have been few populations where associations have not been demonstrated.
Phenotype frequencies of HLA DRB1
HLA DRB1 phenotype
Controls (n = 286)
Cases (n = 680)
Odds ratio (95% confidence interval)
There is some suggestion that the relationship between human leukocyte antigen (HLA) and RA may be more related to the severity of disease, and that the development of arthritis per se is only weakly related. Support for this comes from studies from the Norfolk Arthritis Register population-based study of inflammatory joint disease in England . Data from this study show only a weak relationship between susceptibility to the disease and the HLA DRB1 genotype (Table 1). The association is fairly strong in those individuals who satisfy the criteria for RA. The data show there is an influence of genotype, with some genotypes having a stronger association as shown.
The HLA region on the short arm of chromosome 6 is a gene-rich area including several candidate genes that have an influence on the immune process. One of the most highly investigated is tumour necrosis factor (TNF). Studies have shown associations between TNF alleles and RA [35, 36], although one explanation may be linkage disequilibrium with HLA DRB1. Studies have also suggested, however, that the associations between HLA and TNF-c1 and TNF-b3 are independent of associations between HLA and the shared epitope . Other studies have shown an extended haplotype stretching from HLA through to TNF that has been implicated in disease .
Data from twin studies in the HLA association and sharing studies have been used to estimate that only 50% of the genetic contribution to RA can be explained by HLA . This has sparked a search for non-MHC genes.
The largest effort has been expanded in some whole genome screens on affected sibling pair families. Four such screens have now been undertaken in Europe , the United States , Japan  and the United Kingdom . A number of markers emerge from these studies suggestive of a linkage with RA, although the linkage with HLA is by far the strongest. One problem is that such studies often have only a weak power to detect defects. By contrast, because such studies may be simultaneously testing the possibility that any one of 200 regions may be linked with disease, the likelihood of a false-positive result is also very high. It is for this reason that it is not surprising studies often fail to replicate results both between themselves and on further samples within the population. It is therefore necessary to undertake further validation studies and more in-depth investigations, using more closely spaced markers.
An alternative approach is to use a candidate gene screen where there is no prior reason for looking at a particular region. Such an approach has been productive, and evidence has shown that corticotrophin releasing hormone , CYP19 (oestrogen synthase) , IFN-γ [44–46] and other cytokines [47–49] are linked to RA. Other approaches have addressed the possibility that genetic regions linked to other autoimmune diseases, such as insulin-dependent diabetes , may also be linked to RA. Indeed, linkage to a locus on chromosome X was shown in one study . A further strategy has been to use results and genome screens on animal models of arthritis to see whether syntenic regions are also linked to RA in humans. Such studies have suggested linkage to 17q22 .
Whether any of these positive findings discussed will result in the identification of a true disease susceptibility mutation remains to be seen. However, one clear problem is that RA itself is probably heterogeneous and studies that fail to take notice of this heterogeneity may make it possible to find a positive result.
The term 'environment' is frequently used to describe all those susceptibility factors leading to disease that are not explicable on the basis of an identifiable genetic marker. In a strict sense, however, environment could be taken to refer to those factors external to the individual; for example, factors associated with diet, water or air-borne exposures. It is also important to consider factors implicated with diseases that are internal to the subject without an obvious genetic basis. An appropriate term for this group of factors is 'nongenetic host factors'.
The increased risk of RA in females has lead to considerable effort in examining the role of hormonal and pregnancy factors in disease occurrence. In general, male sex hormones, particularly testosterone, are lower in men who have RA . By contrast, levels of female sex hormones are not different between RA cases and controls .
Pregnancy itself has been investigated as a risk factor in RA development. Studies on the influence of pregnancy on RA have produced conflicting results. A number of studies  have suggested that women who are nulliparous are at increased risk of developing the disease, although there is no increased risk in women who are single . It would thus appear that subfertility highlights a group at higher risk.
There have been a number of studies looking at other comorbidities that have an increased frequency in both subjects with RA and in their families. The most widely investigated has been the occurrence of other autoimmune diseases, particularly type 1 or insulin-dependent diabetes and autoimmune thyroid disease . Other diseases, for example schizophrenia, have been shown to be negatively associated with RA development [63–65]. The significance of these findings is unclear.
There have been relatively few studies on anthropometric factors associated with RA, although one recent case–control study suggested that people who were obese were at higher risk . The reason for this is unclear, and it is not certain whether this may represent a confounding factor of another exposure or whether people who are obese have, for example, increased production of oestrogens, which might pose a risk. A more recent case–control study found, however, after adjusting for age, smoking and marital status, that a link with obesity was nonsignificant .
There is much indirect evidence suggesting that exposure to infectious agents may be the trigger for RA. First, epidemiological data come from the observation of a decline in the incidence of RA in several populations [9, 16]. Many studies have indeed shown a halving in incidence over the past 30 years . Given the genetically stable population, the most probable explanation is that of a decline in an infectious trigger. This effect of time on occurrence might also be related to the period of birth as well as to the current year of observations. The Pima Indians, for example, showed a decline in occurrence of disease, and an indepth study based on analysis of birth cohorts has shown a decline in the population occurrence of rheumatoid factor with increasingly recent birth cohorts .
There have been a few studies looking at clustering of RA in time and space, although there have been reports of nonrandom clusters occurring within the Norfolk Arthritis Register population . Other indirect evidence regarding the role of an infectious agent has arisen from case–control studies suggesting that people who have had a blood transfusion, even some years prior to disease onset, may be at an increased risk of disease . Recent practice has been to screen blood for a number of agents such as hepatitis, but the increased reporting of blood transfusion in older cohorts may indeed be explained by the increased likelihood of infection.
There have been a large number of infectious agents that have been implicated in RA, including Epstein–Barr virus and parvovirus, as well as other agents, including bacteria such as Proteus and Mycoplasma. The epidemiological studies supporting or refuting these possible links are reviewed elsewhere  but, in general, such studies have been disappointing. One problem for the epidemiologist is that if RA represents the final common pathway of exposure to one of several different potential susceptibility organisms, many of which are also frequently observed in the general (i.e. nonarthritic population), it makes it more difficult to confirm a relationship with epidemiological studies.
There have been remarkably few studies on factors such as diet, although there is a theoretical basis for investigating the role of omega-3 fatty acids [71, 72]. Randomised trials suggest that diets high in eicosapentaenoic acid have a favourable effect on the outcome of RA [73–75]. This might be because such fatty acids compete with arachidonic acids, the latter of which are involved in inflammation. Whether such dietary factors have a role in RA onset is much less clear.
Summary of recent epidemiological studies showing the association between rheumatoid arthritis and cigarette smoking
Odds ratio (95% confidence interval)
41–50 pack years
There has been considerable recent interest in understanding the epidemiology of RA. There have been several population studies in many different countries around the world, and observations of differential occurrence (with time, between populations and between the genders) has stimulated a number of analytical studies looking for both genetic and environmental risk factors. Future studies will benefit from advances in molecular biology techniques to aid with the identification and characterisation of potential new genes for RA susceptibility. These studies, as already described, have revealed some tantalising clues that will require further follow-up in years to come.
RA presents an epidemiological challenge and further elucidation of both genetic and environmental factors, together with interactions between them, are likely to be revealed.