Skip to main content

Development of a model for predicting the 4-year risk of symptomatic knee osteoarthritis in China: a longitudinal cohort study



We aimed to develop a model for predicting the 4-year risk of knee osteoarthritis (KOA) based on survey data obtained via a random, nationwide sample of Chinese individuals.


Data was analyzed from 8193 middle-aged and older adults included in the China Health and Retirement Longitudinal Study (CHARLS). The incident of symptomatic KOA was defined as participants who were free of symptomatic KOA at baseline (CHARLS2011) and diagnosed with symptomatic KOA at the 4-year follow-up (CHARLS2015). The effects of potential predictors on the incident of KOA were estimated using logistic regression models and the final model was internally validated using the bootstrapping technique. Model performance was assessed based on discrimination—area under the receiver operating characteristic curve (AUC)—and calibration.


A total of 815 incidents of KOA were identified at the 4-year follow-up, resulting in a cumulative incidence of approximately 9.95%. The final multivariable model included age, sex, waist circumference, residential area, difficulty with activities of daily living (ADLs)/instrumental activities of daily living (IADLs), history of hip fracture, depressive symptoms, number of chronic comorbidities, self-rated health status, and level of moderate physical activity (MPA). The risk model showed good discrimination with AUC = 0.719 (95% confidence interval [CI] 0.700–0.737) and optimism-corrected AUC = 0.712 after bootstrap validation. A satisfactory agreement was observed between the observed and predicted probability of incident symptomatic KOA. And a simple clinical score model was developed for quantifying the risk of KOA.


Our prediction model may aid the early identification of individuals at the greatest risk of developing KOA within 4 years.


Knee osteoarthritis (KOA) is among the most common chronic diseases leading to disability worldwide, carrying a substantial and increasing health burden [1, 2]. The prevalence of symptomatic KOA and radiographic KOA in patients over 60 years of age ranges from 10.0 to 16.0% and 35.0 to 50.0% [3,4,5,6,7], respectively. Approximately 250 million people have KOA worldwide, with a twofold increased prevalence in men and a threefold increased prevalence in women in the USA over the past 20 years [5]; symptomatic KOA affects approximately 15.1 million individuals in the US population [8]. The estimated number of individuals over 60 years old suffering from symptomatic KOA reached 37.35 million in China [9]. The years lived with disability (YLDs) caused by osteoarthritis ranked tenth in China in 2016 [10] and fourth in South Korea in 2015 [11]. Osteoarthritis had the fifth greatest relative increase in total YLDs from data of six Nordic countries from 1990 to 2015 [12]. KOA was the first leading among osteoarthritis, accounting for 87% YLDs of osteoarthritis [13]. The increasing prevalence of KOA has increased the socioeconomic burden for affected individuals and healthcare systems [14].

To date, there are no effective therapeutic strategies for KOA. Prediction models for KOA aim to synthesize multiple factors to comprehensively predict the incident risk and may allow for early detection and prevention [15]. The Nottingham KOA model was an early model for the prediction of 12-year KOA risk in middle-aged adults, including easily obtainable factors such as age, sex, family history, body mass index (BMI), occupational risk, and history of knee injury [16]; however, it was developed using data from only two communities in the UK, rather than a random sample of the general population, limiting its validity in other populations [17]. Several studies have developed prediction models based on genomic data [18,19,20,21] or radiographic/clinical biomarkers [22] such as hip α-angle and spinal bone mineral density. However, use of these models is limited due to their high cost or complexity [23].

Primary risk factors for incident KOA include advanced age, female gender, overweight/obesity, knee injury, and smoking [24,25,26,27]. Smoking decreases the risk of KOA, while the other factors increase the risk. Although physical activity [28, 29], occupational factors [24], ethnicity, and genetics [25] have also been associated with the incidence and/or progression of KOA, previous studies have reported inconsistent results due to methodological differences. Other potential risk factors for the development of KOA include metabolic syndrome [30,31,32], waist circumference [33], and depressive symptoms [24, 34], although findings regarding these factors remain controversial. Previous studies have reported a dual association between osteoarthritis and certain comorbidities (e.g., hypertension, ischemic heart disease, diabetes) [24, 25], suggesting that these comorbidities can influence the incidence and progression of KOA. Existing risk models of KOA have failed to include these potential risk factors, and there are currently no models for predicting KOA risk in the Chinese population. In this study, we aimed to develop a model for predicting the 4-year risk of KOA based on survey data obtained via a random, nationwide sample of Chinese individuals. This model would consider the potential risk factors.


Study design and data source

The present retrospective cohort study relied on 4-year data from the China Health and Retirement Longitudinal Study (CHARLS)—a nationwide study among Chinese adults aged 45 years or older for whom the detailed cohort profile has been published [35]. The national baseline survey for the study was conducted between June 2011 and March 2012 (CHARLS2011), and 17,708 respondents across 150 counties/districts and 450 villages/resident committees were recruited using a multistage sampling strategy. The respondents are followed up every 2 years via face-to-face computer-assisted personal interviews. Detailed information related to demographic background, socioeconomic status, biomedical findings, health status, and functioning was collected at baseline and at each follow-up using a structured questionnaire [35]. Blood samples were also obtained at each time point. The present study included participants recruited in CHARLS2011 and re-examined in CHARLS2015.


In this study, the unit of analysis was the person. Individuals who did not suffer from symptomatic KOA in CHARLS2011 and had complete diagnosis of symptomatic in CHARLS2015 were included. Participants who had no complete diagnosis of symptomatic KOA in CHARLS2011 or in CHARLS2015 were excluded. We also excluded those who had over 50% of predictive variables unavailable.


The primary outcome was the incident of symptomatic KOA during the 4-year follow-up period, and the subject was the unit of analysis. In accordance with the definition utilized in a previous study [36], symptomatic KOA was defined as both physician-diagnosed arthritis and the presence of concurrent pain in either knee joint. The incident of symptomatic KOA was defined as the participant being free of symptomatic KOA in CHARLS2011 and diagnosed with symptomatic KOA in CHARLS2015. The presence of pain in the knee joint was assessed based on responses to the following question: “Are you often troubled by pain in any part of your body?” If the participant answered in the affirmative, the following question was asked: “In what part of your body do you feel pain?”

Predictor variables

In CHARLS2011, data related to demographic background, socioeconomic status, biomedical findings, and levels of blood biomarkers was extracted. We included the following risk factors highlighted in previous studies and imputed missing values when necessary. The demographic variables included gender, age (year), BMI (categorized as underweight [< 18.5 kg/m2], normal [18.5–24.9 kg/m2], overweight [25.0–29.9 kg/m2], obese [≥30.0 kg/m2]), waist circumference (cm), and residential area (urban vs. rural). Waist circumference was categorized into four groups: < 85/80 cm, < 90/85 cm, < 95/90 cm, and ≥ 95/90 cm in men/women. The first group was referred to as the normal group and other three groups were referred to as central obesity based on the diagnosed criteria of central obesity recommended by the Department of Disease Control at the Ministry of Health [37]. The behavior variable included smoking status and engagement in vigorous/moderate/light physical activity. The physical activity score was calculated by multiplying the code for the duration by the code for frequency during 1 week [38]. According to physical activity score, the physical activity was divided into three levels (none, 0; low, 1–4; moderate-to-high, ≥ 5). Health-related variables included history of hip fracture, number of other diagnosed comorbidities, metabolic syndrome (MS) in accordance with Chinese Diabetes Society (CDS) criteria [39], depressive symptoms based on Center for Epidemiologic Studies Depression Scale (CESD-10) score [40], self-rated health status and self-reported difficulties with activities of daily living (ADLs) [41], or instrumental activities of daily living (IADLs) [42]. The list of potential predictors is presented in Supplementary Box 1, along with detailed information related to how each predictor was assessed and the used tools.

Statistical analysis

Model structure

In CHARLS2011, physical activity measures were randomly available for 3684 participants, while blood samples were available for 11,847 participants. Hence, physical activity scores of vigorous/moderate/light physical activity and MS were the main predictors with missing values. The percentage of missing values across the predictors varied between 0.04 and 57% in this study. We assumed data were missing at random and imputed 50 datasets based on the multiple imputation by chained equations (MICE) procedure [43]. The MICE technique improved the data accuracy, as any reasons for missing data could be explained by the observed variables included in the imputation model. We included all the predictor variables in the MICE process, along with the diagnosis of symptomatic KOA in CHARLS2011 and in CHARLS2015, as this information provides a stronger correlation structure among covariates used as predictors in the imputation model. Continuous variables (including systolic blood pressure, diastolic blood pressure, triacylglycerol, HDL cholesterol, and fasting blood glucose) were imputed using linear regression, and binary and multiple categorical variables (including duration and frequency of physical activity, history of hip fracture, smoking behavior, self-rated health status, CESD-10 items, and ADL/IADLs items) were imputed using logit regression.

Descriptive statistics (means and standard deviations for continuous data, and counts and percentages for categorical data) were used to report key variables. Univariable and multivariable logistic regression analyses were used to establish a model for predicting the risk of KOA. All candidate variables were first evaluated via an unconditional univariable logistic regression analysis, and we then selected variables according to clinical value combined with statistical significance to conduct multivariable logistic regression analysis. In the multivariable logistic regression analysis, stepwise selection was combined with the Akaike information criterion (AIC) to determine the final model structure. The coefficients, odds ratios (ORs), and 95% CIs were estimated via 1000-replication bootstrapping to obtain stable and unbiased parameters [44]. We combined the estimates using Rubin’s rules [45].

Internal validation

The multivariable models were internally validated using a bootstrap procedure (sampling with replacement for 1000 iterations) to assess bias-corrected estimates of predictive ability.

Model performance

We assessed the predictive performance of the final model using calibration and discrimination measures. Discrimination refers to the ability to distinguish patients experiencing an event from those not experiencing the event and was quantified based on the area under the receiver operating characteristic curve (AUC) in this study. Calibration refers to how closely the predicted risk corresponds with the observed risk and was assessed visually using calibration plots.

Clinical scoring tool

We developed a points-based risk-scoring tool based on the final model for easy clinical use—a widely utilized method of clinical scoring [23]. This clinical risk prediction tool can be used to identify individuals who are at high risk of developing KOA during the following 4 years. Continuous factors were categorized based on the results of meta-analyses and clinical practice guidelines. Scores for categorical variables were determined by multiplying the β coefficients (log odds) in the multivariable logistic regression model by ten and rounding off decimal place. The total score was calculated by summing the scores of all variables. Sensitivity, specificity, and the AUC were calculated at different cut-off values, and the maximal Youden index was used to identify the optimal cut-off point [46]. The Youden index was calculated as follows: sensitivity + specificity − 1.

The present study was conducted in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines for model development and reporting. All analyses were performed using STATA version 15.1 (STATA Corporation, College Station, TX) and R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). All statistical tests were two-sided and P values of < 0.05 were considered statistically significant.

Ethics statement

Given that the present study is a secondary analysis of publicly available CHARLS data, the Medical Ethics Board Committee of Peking University granted the study an exemption from review.


In CHARLS2011, physical activity measures were available for 3684 participants, while blood samples were available for 11,847 participants. Complete KOA data were available in CHARLS2011 and CHARLS2015 for 9204 of these participants. Seven participants were excluded because they declined to undergo body measurements assessments, rendering over 50% of participant’s variables (including measurements of weight, height, waist circumference, assessments of depressive symptoms, physical activity, ADLs/IADLs, or the blood biomarkers) inaccessible. Among them, one patient was diagnosed with KOA in 2011, and one developed KOA in 2015. Among the remaining 9197 participants, an additional 1004 were excluded because they were diagnosed with KOA at baseline (CHARLS2011). Thus, data from a total of 8193 patients were included when developing the model. Among the 8193 included participants, 815 developed symptomatic KOA in the following 4 years. The overall 4-year cumulative incidence of symptomatic KOA was 9.95%, with 7.62% and 13.77% in males and females respectively.

The mean age was 58.82 years (standard deviation (SD) ± 9.01 years), and 4251 patients were female (51.89%). At baseline, 23.31% participants had difficulty with ADLs/IADLs, while 17.08% were diagnosed with metabolic syndrome, and 44.66% reported one or two chronic comorbidities. A history of hip fracture was reported by 252 (3.08%) participants. Other baseline characteristics are summarized in Table 1, along with the number of missing values for each variable.

Table 1 Baseline characteristics and outcomes of the study cohort summarized by their count and fraction (N (%)) for categorical or the mean and standard deviation for continuous variables, respectively

Univariable and multivariable analysis

Table 2 shows the results of the univariable and multivariable analysis based on the imputed datasets. In the univariable analysis, age was identified as a risk factor for KOA, with the biggest difference occurring between the 60–64 and 65–69 age groups. Female sex, rural residence, history of hip fracture, ADL/IADL difficulty, severe depressive symptoms, more chronic comorbidities, poor health status, and higher levels of moderate physical activity (MPA) were significantly associated with an increased risk of developing KOA (all P values ≤ 0.01), while smoking was significantly associated with a decreased risk of developing KOA (P ≤ 0.01). Although high BMI/waist circumference and metabolic syndrome were also positively associated with the incidence of KOA, these associations were not significant (all P values > 0.05). Considering the important effects of metabolism and vigorous physical activity on the incident of KOA, we included metabolic syndrome and level of vigorous physical activity (VPA) in the multivariable logistic model, although the significance was not significant. As the clinical values of BMI and waist circumference are comparable, we selected waist circumference into the multivariable logistic regression given the relatively smaller P values.

Table 2 Results of logistic regression models for incident of KOA generated from 8193 participants in CHARLS 2011–2015

The final prediction model included ten variables: age, sex, waist circumference, residential area, ADLs/IADLs difficulty, history of hip fracture, depressive symptoms, number of chronic comorbidities, health status, and level of MPA.

Model performance

The discrimination and calibration curves for the model are shown as Fig. 1a and b, respectively. The final prediction model achieved acceptable discrimination, AUC = 0.719 (95% CI, 0.700–0.737), with optimism = 0.007 and bias-corrected AUC = 0.712 after bootstrap validation. The apparent observed line was quite close to the ideal line, while the bias-corrected line was slightly further from the ideal line than the observed line after the bootstrap procedure.

Fig. 1

The discrimination and calibration curves of final risk model. a ROC curve analysis for predicting symptomatic KOA when using age, sex, waist circumference, residential area, ADL/IADL difficulty, history of hip fracture, depressive symptoms, number of chronic comorbidities, health status, and level of MPA. The AUC was 0.719 (95% CI 0.700–0.737), and optimism-corrected AUC was 0.712 after bootstrap validation. b The calibration curve. Area under the receiver characteristic curve, AUC; receiver operating characteristic curve, ROC

Clinical score model

We developed a simple clinical score model based on the ten variables included in the final multivariable model (Table 3). Total scores in this model range from 0 (lowest risk) to 51 (greatest risk). This clinical score model may aid in identifying patients at the greatest risk for developing KOA within the next 4 years. The AUC of the risk score model was 0.713 (95% CI, 0.695–0.731), and the optimal cut-off, where patients with a score ≥ 20.5 were most likely to develop KOA in 4 years, was obtained from the maximal Youden index. At the optimal cut-off, the sensitivity and specificity were 63.3% and 66.0%, respectively. Referring to the previous score model [22], the incident probability of KOA within 4 years was calculated by dividing the total risk score by 51 and multiplying by 100%.

Table 3 Risk score model of KOA incident prediction


We developed and internally validated a model for predicting the 4-year risk of symptomatic KOA among the Chinese population, based on data from the CHARLS cohort. An easy-to-use clinical score model was developed to identify individuals’ risk of developing KOA. The model included ten convenient and accessible variables, including age, sex, and waist circumference, which are most commonly included in previous KOA prediction models. Besides we also included the other controversial or new predictors of KOA, which were first time tested in risk model of KOA. To our knowledge, this is the first model for predicting KOA risk in the Chinese population, and our results suggest that this model can be used to aid in the prevention of KOA.

Older age was identified as a risk factor for KOA in our study; the most significant increase in risk was observed in the 60–69 years group. The cumulative incidence of symptomatic KOA gradually increased from 45 years of age, increasing rapidly after 55 years of age, peaking at approximately 65 years of age [47]. After 70 years of age, increases in the cumulative incidence of KOA were no longer significant [47]. Our findings, along with previous, highlight the need to prevent the incident of KOA in individuals between 45 and 70 years of age.

Obesity creates an abnormal loading environment for weight-bearing joints and may contribute to the pathogenesis of KOA [48]. Alternatively, the increased risk of KOA may be caused by the positive energy balance and metaflammation associated with obesity [49]. Although BMI has been illustrated as an important predictor of KOA [50], Wallace et al. (2019) [48] reported that increased abdomen size is associated with a greater risk of radiographic KOA than high BMI. Further studies are required to determine whether BMI, waist circumference, or metabolic syndrome comprehensively influences KOA risk due to mechaflammation and metaflammation. In this study, we analyzed the effects of BMI, waist circumference, and metabolic syndrome on KOA incident in the Chinese population. None of these three factors were a significant predictor of KOA incident; BMI had relatively low significance compared with waist circumference.

We analyzed the likelihood that the damage by BMI on joint tissues and pain symptoms would not reach a significant effect in the short term. Only the 12-year Nottingham KOA model investigated BMI related to symptomatic KOA [16], and the 9-year Rotterdam model [21] and 4-year Chingford model [22] were predictive for radiographic KOA. BMI was also not included in final 4-year Chingford model. Zheng & Chen [50] synthesized that BMI was a significant factor for incident KOA, but the diagnosis of KOA was radiographic KOA or severe KOA or replacement of KOA in 13 of 14 included studies. Among the eight studies with follow-up duration less than 10 years, none focused on symptomatic KOA. Another possible reason was that abnormal waist circumference was much prevalent than abnormal BMI in the Chinese population because body feature is prone to be small in the Asian race compared with the European or American population [51], thus waist circumference was much significant with incident KOA than BMI in this study. The results imply that the sensitivity of index of obesity might vary with race when evaluating the risk of KOA. Additional studies focusing on risk model of KOA are required to verify the significance of BMI with the incident of symptomatic KOA in Chinese population and other ethnic populations.

Another conventional risk factor for KOA was physical activity. In the present study, no significant association was observed between VPA/light physical activity (LPA) and the incident of KOA; however, MPA positively predicted the incident of KOA. The reported associations between physical activity and the incident of KOA were inconsistent, resulting from a variation in assessment methods, activity categories, or populations. Felson et al. [28] reported walking and other recreational activities did not increase the risk of OA in older adults. Results from the Chingford cohort demonstrated that physical activities related to work and sports increase the risk of osteophytes, while walking decreases the risk of osteophytes in middle-aged women [52]; however, all effects were not statistically significant. Findings from the Framingham Heart Study [53] indicate that performing over 2 h/day would increase risk symptomatic KOA (OR, 5.3; 95% CI, 1.2–24) and the association was also significant for radiographic KOA (OR, 1.3 per hour; 95% CI, 1.1–1.6), while the effects of MPA and LPA were insignificant. Given the discrepancy between studies, additional studies should aim to verify the influence of different types of physical activity on the risk of KOA. Such studies should seek to determine the most appropriate type, duration, frequency, and intensity of physical activity for preventing KOA in different populations.

In our model, health-related variables are addressed and our findings provide evidence that these variables contribute essential values to the prediction of symptomatic KOA. Depressive symptoms, comorbidities, and history of hip fracture are psychologically and physiologically objective factors related to KOA. Self-rated health and difficulty with ADL/IADLs were mainly subjective, which was reflected in the patient’s knowledge and ability to cope with disease.

Patients with KOA are prone to be comorbid with depression and other chronic comorbidities, and chronic diseases often exhibit interactions with comorbidities in complex ways [54]. Hence, previous studies have assumed that there may be a potential effect of depression and chronic comorbidities on incident of KOA. Seavey et al. [34] indicated that depressive symptoms represented a risk factor for arthritis incident (OR, 1.72; 95% CI, 1.27–2.35). Jinks et al. [55] also reported that depression was a significant predictor of knee pain (OR, 1.4; 95% CI, 1.1–1.8), where pain is the dominant physical symptom among patients with symptomatic KOA. Our study is the first model involving depression as predictor in a prediction model of symptomatic KOA. Patients with mild or moderate-to-severe depression were two or three times more likely to develop KOA than those without depression. Although a bidirectional causal association has rarely been illustrated either between arthritis and depression or between any other chronic disease and depression, targeted strategies for addressing depressive symptoms may therefore aid in reducing the incident of KOA. We also assessed relationships for 12 main types of comorbidities with incident of KOA. KOA and comorbidities may accelerate the progression of one another [24]. Results showed that patients with comorbidities had a significantly increased risk of developing KOA within 4 years. This addressed the effect of comorbidities in developing KOA in the Chinese population; this might have some value for developing prediction model of KOA in other ethnic groups.

Related studies [56, 57] have demonstrated that rheumatoid arthritis increases the risk of hip fracture due to bone loss induced by chronic inflammation, use of glucocorticoids, and physical inactivity. However, rare studies indicated the association between hip fracture and KOA incident. Given that the knee and hip joints are the two most important weight-bearing joints, we sought to determine whether a history of hip fracture increases the risk of developing KOA. Our findings indicated that a history of hip fracture was associated with a 53% increase in the risk of KOA. Identifying the potential mechanisms underlying this association should be helpful for development of risk model in further studies.

Patient-reported outcome (PRO) has been emphasized in multiple studies because PROs may capture important disease-related information prior to the onset of clinical signs or pathophysiological changes [58]. Silverwood et al. [24] noted that poor self-rated health status was a potential risk factor for KOA in an earlier study, although the association was insignificant. Our model showed that the likelihood of developing KOA increased as health status worsened and impairments in ADLs/IADLs. Self-rated health status and assessments of difficulty with ADLs/IADLs could be significant predictors for incident of symptomatic KOA. Self-ratings of health status comprehensively reflect one’s physical and psychological function, as well as one’s knowledge and ability to cope with diseases and self-efficiency. Most of the existing potential risk factors were pooled from epidemiological analyses or clinicians’ experience. Our findings highlight the need to consider the patient’s perspective, as this may aid in furthering our understanding of KOA while reducing the incidence of the disease. Symptomatic KOA progressively decreases self-care ability, causing knee pain or stiffness. Our results implied that impairments in ADLs/IADLs prior to KOA onset may represent a predictive signal for KOA. Hence, preventive interventions may be useful in reducing the incident of KOA in those who have difficulty with ADL/IADL. Improving ADLs/IADLs might become a new interventional target to prevent KOA.

Preventing KOA or other chronic disease in rural area is the biggest challenge in China because of large population [36]. Our model included resident area as one predictor for KOA aiming to improve the prevention of KOA in rural population. Factors related to the high prevalence in rural areas may be multiple, including limited access to knowledge regarding the prevention of KOA and other chronic diseases, a lack of economic resources for timely treatment of chronic diseases, poor ability to manage one’s health, and earlier impairments in physical function due to strenuous farm work. We hope that our results could promote policies and resources directed toward preventing KOA in Chinese rural areas in future.

We developed an easy-to-use clinical score model to identify risk of symptomatic KOA within 4 years and to identify individuals at high risk. This model involved ten commonly available variables. This simple model involved ten commonly available variables, the assessment of variables, and the calculation of risk score, which are both easily understood and handled in practice. Clinicians or the patients themselves could use this tool to assess the risk of KOA in 4-year term. While the score model showed a good performance in assessing risk of KOA (AUC = 0.713; 95% CI 0.695–0.731), the ability to identify individuals at high risk was moderate using a − 20.5 cut-off. Hence, this model should be improved and adjusted when applied in other populations. Adding other clinical biomarkers would provide further insight.

Limitations of the study include the incomplete data, though which was handed using the imputation method and a bootstrap strategy, may have biased our findings, especially the physical activity with a high percentage missing. Second, while new variables included in the study were significantly associated with the incident of KOA, further studies are required to elucidate the mechanisms underlying these associations. Lastly, our model was internally validated; therefore, external validation in other Chinese populations and different ethnic groups remains necessary.


In the present study, we developed the first model for predicting the 4-year risk of developing symptomatic KOA in China, using longitudinal cohort. Our simple score model may aid in the early identification of individuals at the greatest risk of developing KOA within 4 years in clinical practice or community setting. Such early identification may allow for improved patient education and modification of certain risk factors, which may in turn decrease rates of KOA incident.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.



Activities of daily living


Akaike information criterion


Area under the receiver operating characteristic curve


Body mass index


Center for Epidemiologic Studies Depression Scale


China Health and Retirement Longitudinal Study


Confidence intervals


Instrumental activities of daily living


Knee osteoarthritis


Light physical activity


Moderate physical activity


Metabolic syndrome


Odds ratio


Physical activity


Patient-reported outcome


Standard deviation


Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis


Vigorous physical activity


  1. 1.

    Prieto-Alhambra D, Judge A, Javaid MK, Cooper C, Diez-Perez A, Arden NK. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann Rheum Dis. 2014;73(9):1659–64.

    PubMed  Article  Google Scholar 

  2. 2.

    Hunter DJ, Schofield D, Callander E. The individual and socioeconomic impact of osteoarthritis. Nat Rev Rheumatol. 2014;10(7):437–41.

    PubMed  Article  Google Scholar 

  3. 3.

    Cho HJ, Morey V, Kang JY, Kim KW, Kim TK. Prevalence and risk factors of spine, shoulder, hand, hip, and knee osteoarthritis in community-dwelling Koreans older than age 65 years. Clin Orthop Relat Res. 2015;473(10):3307–14.

    PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Lawrence RC, Felson DT, Helmick CG, Arnold LM, Choi H, Deyo RA, et al. Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Arthritis Rheum. 2008;58(1):26–35.

    PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Nguyen U, Zhang YQ, Zhu YY, Niu JB, Zhang B, Felson DT. Increasing prevalence of knee pain and symptomatic knee osteoarthritis: survey and cohort data. Ann Intern Med. 2011;155(11):725–U135.

    PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Postler A, Luque Ramos A, Goronzy J, Günther K-P, Lange T, Schmitt J, et al. Prevalence and treatment of hip and knee osteoarthritis in people aged 60 years or older in Germany: an analysis based on health insurance claims data. Clin Interv Aging. 2018;13:2339–49.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Guillemin F, Rat AC, Mazieres B, Pouchot J, Fautrel B, Euller-Ziegler L, et al. Prevalence of symptomatic hip and knee osteoarthritis: a two-phase population-based survey. Osteoarthr Cartil. 2011;19(11):1314–22.

    CAS  Article  Google Scholar 

  8. 8.

    Deshpande BR, Katz JN, Solomon DH, Yelin EH, Hunter DJ, Messier SP, et al. The number of persons with symptomatic knee osteoarthritis in the United States: impact of race/ethnicity, age, sex, and obesity. Arthritis Care Res. 2016;68(12):1743–50.

    Article  Google Scholar 

  9. 9.

    WANG L, CHEN H, Lu H, Yue W, Shang S. Research progress on disease burden and disease risk models of knee osteoarthritis. Chin Nurs Res. 2020;34(20):3642–6.

    Google Scholar 

  10. 10.

    ZENG X, Qi J, Yin P, Wang L, Liu Y, Liu J, et al. 1990 to 2016 disease burden in China and administrative regions. Chin Circ J. 2018;33(12):1147–58.

    Google Scholar 

  11. 11.

    Radnaabaatar M, Kim Y-E, Go D-S, Jung Y, Yoon S-J. Burden of disease in coastal areas of South Korea: an assessment using health insurance claim data. Int J Environ Res Public Health. 2019;16(17):3044.

    PubMed Central  Article  Google Scholar 

  12. 12.

    Kiadaliri AA, Lohmander LS, Moradi-Lakeh M, Petersson IF, Englund M. High and rising burden of hip and knee osteoarthritis in the Nordic region, 1990–2015: findings from the global burden of disease study 2015. Acta Orthop. 2018;89(2):177–83.

    PubMed  Article  PubMed Central  Google Scholar 

  13. 13.

    GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet (London, England). 2018;392(10159):1789–858.

  14. 14.

    Abbott JH, Usiskin IM, Wilson R, Hansen P, Losina E. The quality-of-life burden of knee osteoarthritis in New Zealand adults: a model-based evaluation. PLoS One. 2017;12(10):e0185676.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Lee Y-h, Bang H, Kim DJ. How to establish clinical prediction models. Endocrinol Metab (Seoul). 2016;31(1):38–44.

    CAS  Article  Google Scholar 

  16. 16.

    Zhang W, McWilliams DF, Ingham SL, Doherty SA, Muthuri S, Muir KR, et al. Nottingham knee osteoarthritis risk prediction models. Ann Rheum Dis. 2011;70(9):1599–604.

    PubMed  Article  Google Scholar 

  17. 17.

    Michl GL, Katz JN, Losina E. Risk and risk perception of knee osteoarthritis in the US: a population-based study. Osteoarthr Cartil. 2016;24(4):593–6.

    CAS  Article  Google Scholar 

  18. 18.

    Valdes AM, Doherty M, Spector TD. The additive effect of individual genes in predicting risk of knee osteoarthritis. Ann Rheum Dis. 2008;67(1):124–7.

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Takahashi H, Nakajima M, Ozaki K, Tanaka T, Kamatani N, Ikegawa S. Prediction model for knee osteoarthritis based on genetic and clinical information. Arthritis Res Ther. 2010;12(5):R187–R.

    PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Blanco FJ, Möller I, Romera M, Rozadilla A, Sánchez-Lázaro JA, Rodríguez A, et al. Improved prediction of knee osteoarthritis progression by genetic polymorphisms: the Arthrotest Study. Rheumatology (Oxford, England). 2015;54(7):1236–43.

    CAS  Article  Google Scholar 

  21. 21.

    Kerkhof HJM, Bierma-Zeinstra SMA, Arden NK, Metrustry S, Castano-Betancourt M, Hart DJ, et al. Prediction model for knee osteoarthritis incidence, including clinical, genetic and biochemical risk factors. Ann Rheum Dis. 2014;73(12):2116–21.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  22. 22.

    Garriga C, Sanchez-Santos MT, Judge A, Hart D, Spector T, Cooper C, et al. Predicting incident radiographic knee osteoarthritis in middle-aged women within 4 years: the importance of knee-level prognostic factors. Arthritis Care Res (Hoboken). 2020;72(1):88-97.

  23. 23.

    Jiang WH, Wang JY, Shen XF, Lu WL, Wang Y, Li W, et al. Establishment and validation of a risk prediction model for early diabetic kidney disease based on a systematic review and meta-analysis of 20 cohorts. Diabetes Care. 2020;43(4):925–33.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  24. 24.

    Silverwood V, Blagojevic-Bucknall M, Jinks C, Jordan JL, Protheroe J, Jordan KP. Current evidence on risk factors for knee osteoarthritis in older adults: a systematic review and meta-analysis. Osteoarthr Cartil. 2015;23(4):507–15.

    CAS  Article  Google Scholar 

  25. 25.

    Glyn-Jones S, Palmer AJR, Agricola R, Price AJ, Vincent TL, Weinans H, et al. Osteoarthritis. Lancet (British edition). 2015;386(9991):376–87.

    CAS  Google Scholar 

  26. 26.

    Kong L, Wang L, Meng F, Cao J, Shen Y. Association between smoking and risk of knee osteoarthritis: a systematic review and meta-analysis. Osteoarthr Cartil. 2017;25(6):809–16.

    CAS  Article  Google Scholar 

  27. 27.

    Hui M, Doherty M, Zhang W. Does smoking protect against osteoarthritis? Meta-analysis of observational studies. Ann Rheum Dis. 2011;70(7):1231–7.

    PubMed  Article  Google Scholar 

  28. 28.

    Felson DT, Niu J, Clancy M, Sack B, Aliabadi P, Zhang Y. Effect of recreational physical activities on the development of knee osteoarthritis in older adults of different weights: the Framingham study. Arthritis Care Res. 2007;57(1):6–12.

    Article  Google Scholar 

  29. 29.

    Regnaux J-P, Regnaux J-P, Lefevre-Colau M-M, Lefevre-Colau M-M, Trinquart L, Trinquart L, et al. High-intensity versus low-intensity physical activity or exercise in people with hip or knee osteoarthritis. Cochrane Database Syst Rev. 2015;10(10):CD010203.

    Google Scholar 

  30. 30.

    Courties A, Sellam J, Berenbaum F. Metabolic syndrome-associated osteoarthritis. Curr Opin Rheumatol. 2017;29(2):214–22.

    CAS  PubMed  Article  Google Scholar 

  31. 31.

    Li S, Felson DT. What is the evidence to support the association between metabolic syndrome and osteoarthritis? A systematic review. Arthritis Care Res (2010). 2019;71(7):875–84.

    Article  Google Scholar 

  32. 32.

    Maddah S, Mahdizadeh J. Association of metabolic syndrome and its components with knee osteoarthritis. Acta Med Iran. 2015;53(12):743–8.

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Gill SV, Hicks GE, Zhang Y, Niu J, Apovian CM, White DK. The association of waist circumference with walking difficulty among adults with or at risk of knee osteoarthritis: the osteoarthritis initiative. Osteoarthr Cartil. 2017;25(1):60–6.

    CAS  Article  Google Scholar 

  34. 34.

    Seavey WG, Kurata JH, Cohen RD. Risk factors for incident self-reported arthritis in a 20 year followup of the Alameda County study cohort. J Rheumatol. 2003;30(10):2103–11.

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS); 2014.

    Google Scholar 

  36. 36.

    Tang X, Wang S, Zhan S, Niu J, Tao K, Zhang Y, et al. The prevalence of symptomatic knee osteoarthritis in China: results from the China Health and Retirement Longitudinal Study. Arthritis Rheumatol (Hoboken, NJ). 2016;68(3):648–53.

    Article  Google Scholar 

  37. 37.

    Chen C, Lu FC, Department of Disease Control Ministry of Health PRC. The guidelines for prevention and control of overweight and obesity in Chinese adults. Biomed Environ Sci. 2004;17:1–36.

    PubMed  PubMed Central  Google Scholar 

  38. 38.

    MacInnis RJ, English DR, Hopper JL, Haydon AM, Gertig DM, Giles GG. Body size and composition and colon cancer risk in men. Cancer Epidemiol Biomark Prev. 2004;13(4):553–9.

    Google Scholar 

  39. 39.

    Metabolic Syndrome Research Group of Chinese Medical Association. Suggestions on metabolic syndrome from Chinese Diabetes Society. Chin J Diabetes Mellitus. 2004;12(03):5–10.

    Google Scholar 

  40. 40.

    Cheng S-T, Chan ACM. The Center for Epidemiologic Studies Depression Scale in older Chinese: thresholds for long and short forms. Int J Geriatr Psychiatry. 2005;20(5):465–70.

    PubMed  Article  Google Scholar 

  41. 41.

    Katz S, Ford AB, Moskowitz RW, Jackson BA, Jaffe MW. Studies of illness in the aged. The index of ADL: a standardized measure of biological and psychosocial function. JAMA. 1963;185:914–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Lawton MP, Lawton MP, Brody EM, Brody EM. Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist. 1969;9(3):179–86.

    CAS  PubMed  Article  Google Scholar 

  43. 43.

    Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 2016;8(1):3–15.

    Article  Google Scholar 

  44. 44.

    Vergouwe Y, Royston P, Moons KGM, Altman DG. Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol. 2010;63(2):205–14.

    PubMed  Article  Google Scholar 

  45. 45.

    White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.

    PubMed  Article  Google Scholar 

  46. 46.

    Zhang L, Guo L, Wu H, Gong X, Lv J, Yang Y. Role of physical performance measures for identifying functional disability among Chinese older adults: data from the China Health and Retirement Longitudinal Study. PLoS One. 2019;14(4):e0215693.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Losina E, Weinstein AM, Reichmann WM, Burbine SA, Solomon DH, Daigle ME, et al. Lifetime risk and age at diagnosis of symptomatic knee osteoarthritis in the US. Arthritis Care Res (2010). 2013;65(5):703–11.

    Article  Google Scholar 

  48. 48.

    Wallace IJ, Felson DT, Worthington S, Duryea J, Clancy M, Aliabadi P, et al. Knee osteoarthritis risk in non-industrial societies undergoing an energy balance transition: evidence from the indigenous Tarahumara of Mexico. Ann Rheum Dis. 2019;78(12):1693–8.

    PubMed  Article  Google Scholar 

  49. 49.

    Berenbaum F, Wallace IJ, Lieberman DE, Felson DT. Modern-day environmental factors in the pathogenesis of osteoarthritis. Nat Rev Rheumatol. 2018;14(11):674–81.

    PubMed  Article  Google Scholar 

  50. 50.

    Zheng H, Chen C. Body mass index and risk of knee osteoarthritis: systematic review and meta-analysis of prospective studies. BMJ Open. 2015;5(12):e007568.

    PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Association MSRGoCM. Suggestions on metabolic syndrome from Chinese Diabetes Society. Chin J Diabetes Mellitus. 2004;03:5–10.

    Google Scholar 

  52. 52.

    Hart DJ, Doyle DV, Spector TD. Incidence and risk factors for radiographic knee osteoarthritis in middle-aged women: the Chingford Study. Arthritis Rheumatism. 1999;42(1):17–24.

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    McAlindon TE, Wilson PWF, Aliabadi P, Weissman B, Felson DT. Level of physical activity and the risk of radiographic and symptomatic knee osteoarthritis in the elderly: the Framingham study. Am J Med. 1999;106(2):151–7.

    CAS  PubMed  Article  Google Scholar 

  54. 54.

    Xu X, Mishra GD, Jones M. Evidence on multimorbidity from definition to intervention: an overview of systematic reviews. Ageing Res Rev. 2017;37:53–68.

    PubMed  Article  Google Scholar 

  55. 55.

    Jinks C, Jordan KP, Blagojevic M, Croft P. Predictors of onset and progression of knee pain in adults living in the community. A prospective study. Rheumatology (Oxford, England). 2007;47(3):368–74.

    Article  Google Scholar 

  56. 56.

    Arai K, Suzuki N, Murayama T, Kondo N, Otsuka H, Koizumi M, et al. Age at the time of hip fracture in patients with rheumatoid arthritis is 4 years greater than it was 10 years before, but is still younger than that of the general population. Mod Rheumatol. 2019;30(1):64–9.

    PubMed  Article  Google Scholar 

  57. 57.

    Yamamoto Y, Turkiewicz A, Wingstrand H, Englund M. Fragility fractures in patients with rheumatoid arthritis and osteoarthritis compared with the general population. J Rheumatol. 2015;42(11):2055–8.

    CAS  PubMed  Article  Google Scholar 

  58. 58.

    Nelson EC, Eftimovska E, Lind C, Hager A, Wasson JH, Lindblad S. Patient reported outcome measures in practice. BMJ. 2015;350(feb10 14):g7818–g.

    PubMed  Article  PubMed Central  Google Scholar 

Download references


We thank the China Health and Retirement Longitudinal Study (CHARLS) team for providing nationally representative data.


This work was supported by the National Natural Science Foundation of China (no. 81972158).

Author information




All authors were involved in revising the article, and all authors approved the final version to be published. Study conception and design: LMW and SMS. Data analysis and interpretation: LMW, HL, HBC, SDJ, and MQW. Draft of the manuscript: LMW, HL, and SMS.

Corresponding author

Correspondence to Shaomei Shang.

Ethics declarations

Ethics approval and consent to participate

The Medical Ethics Board Committee of Peking University granted the study an exemption from review.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Box 1.

Potential predictors and methods of measurement.

Additional file 2: Supplementary Figure 1.

Flowchart of study participants.

Additional file 3: Supplementary Table 1.

Baseline characteristic in excluded participants and included participants.

Additional file 4: Supplementary Table 2.

Multiple Testing Results of Logistic Regression Models.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Lu, H., Chen, H. et al. Development of a model for predicting the 4-year risk of symptomatic knee osteoarthritis in China: a longitudinal cohort study. Arthritis Res Ther 23, 65 (2021).

Download citation


  • Knee osteoarthritis
  • Risk
  • Prediction model