Skip to main content
  • Research article
  • Open access
  • Published:

A consensus-based framework for conducting and reporting osteoarthritis phenotype research



The concept of osteoarthritis (OA) heterogeneity is evolving and gaining renewed interest. According to this concept, distinct subtypes of OA need to be defined that will likely require recognition in research design and different approaches to clinical management. Although seemingly plausible, a wide range of views exist on how best to operationalize this concept. The current project aimed to provide consensus-based definitions and recommendations that together create a framework for conducting and reporting OA phenotype research.


A panel of 25 members with expertise in OA phenotype research was composed. First, panel members participated in an online Delphi exercise to provide a number of basic definitions and statements relating to OA phenotypes and OA phenotype research. Second, panel members provided input on a set of recommendations for reporting on OA phenotype studies.


Four Delphi rounds were required to achieve sufficient agreement on 11 definitions and statements. OA phenotypes were defined as subtypes of OA that share distinct underlying pathobiological and pain mechanisms and their structural and functional consequences. Reporting recommendations pertaining to the study characteristics, study population, data collection, statistical analysis, and appraisal of OA phenotype studies were provided.


This study provides a number of consensus-based definitions and recommendations relating to OA phenotypes. The resulting framework is intended to facilitate research on OA phenotypes and increase combined efforts to develop effective OA phenotype classification. Success in this endeavor will hopefully translate into more effective, differentiated OA management that will benefit a multitude of OA patients.


There has been longstanding acceptance of the heterogeneity of osteoarthritis (OA), but this topic is attracting increasing interest given an expanding armamentarium for classification (biological, psychosocial, and statistical); new insights into the pathophysiology, prognosis, and patterns of response to new and existing interventions; and a general move towards personalized care to improve efficiency and effectiveness [1]. A specific development more recently has been to invoke the concept of OA phenotypes. According to this concept, OA is composed of a number of phenotypes that may be present to a varying extent among patients spanning the spectrum of disease [2]. These phenotypes may differ in their compatibility with study designs in research and diagnostic and therapeutic strategies in clinical care. Although seemingly plausible, a wide range of views exist on how best to operationalize this concept, and similarly, a variety of approaches have been used to explore heterogeneity in empirical studies. Studies focused on OA phenotypes published up to day differ in approach, criteria to distinguish phenotypes, and their outcomes. Furthermore, different papers used different and sometimes confusing terminologies. A more consistent use of terminology would increase the synergy between studies and facilitate the progress of the field as a whole. Furthermore, to allow effective comparison between studies and meta-analyses, complete reporting of relevant data is important.

In the field of back pain, the pathway from basic research to successful phenotyping has been argued as composing of a number of steps [3]. First, there are studies of assessment methods that could potentially provide important data on one or more phenotypes. For example, one may develop a new imaging technique to assess the glycosaminoglycan content of articular cartilage. Second, hypothesis-generating studies aim to determine which characteristics identify people in clinically important subgroups. For example, a biochemical marker level could be higher in a particular subgroup of knee OA patients. Third, hypothesis-testing studies test a priori hypotheses about subgrouping effects in samples of people independent from, but similar to, those people involved in the hypothesis-setting phase. Studies in this phase typically follow a more stringent approach than those at the hypothesis-generating stage. For example, the biochemical marker is now evaluated in a larger, well-characterized patient sample and might include healthy controls. Fourth, relatively narrow validation studies attempt to replicate findings of hypothesis-testing studies in independent samples of people who are similar to those originally studied. Fifth, broader validation studies try to replicate the findings of hypothesis-generating studies in independent samples from broader populations than those originally tested. For example, the biochemical markers for phenotyping OA patients in specialist outpatient clinics would be tested in a primary care setting. Sixth and last, impact-analysis studies examine the capacity of a specific phenotyping method in routine care settings to change clinical decision-making, improve patient outcomes, and/or increase health system efficiency.

The current project aimed to provide a widely supported framework for designing and conducting research along the pathway from basic research to successful clinical application of OA phenotyping, through consensus on a number of definitions and conceptual statements and a set of reporting recommendations. Ultimately, if adopted widely, this should contribute to a more coordinated research effort.


The framework consisted of two main parts. First, a panel of researchers with expertise in OA phenotype research commented on a number of basic definitions and conceptual statements in an online Delphi exercise. Second, the panel provided input on reporting recommendations in a face-to-face meeting.

Panel composition

The panel consisted of 25 members. Panel members were selected to encompass an array of expertise in OA related topics, career stages, and geographical origins. Each of the members had demonstrable experience in phenotype research, as became evident from publications in peer-reviewed scientific journals. The panel was composed and led by a core group (WEvS, SBZ, LAD, DJH).

Basic definitions and statements

We used the Delphi Decision Aid hosted on the website that was originally developed by J. Scott Armstrong (University of Pennsylvania). It is managed by Kesten C. Green (University of South Australia), and the software is maintained by Saint Louis Integration (

An initial set of statements was developed by the core group. Statements were based on points that became apparent from literature data on OA phenotypes [1, 4] and phenotype studies in other diseases [3, 5]. All panel members were then invited to score every statement on a range from 0 to 100% (0% meaning no agreement and 100% meaning complete agreement) and provide comments to explain and contextualize the scores. They could also suggest additional statements. For each subsequent round, statements were adapted by the core group in response to scores and comments. In this process, some statements were combined and others split up. Rounds were continued until all statements scored ≥ 80% on average.

Panel members were provided with a document showing anonymized scores and comments from previous Delphi rounds (Additional file 1). The document also explained how and why statements were adapted between rounds. For any particular round, panel members were not aware of scores and comments from other panel members during the time the round was open.

Reporting recommendations

Based on similar initiatives for research publications in general, or in other fields, a set of reporting recommendations was compiled by the core group [6,7,8]. These recommendations were discussed in a meeting with panel members on 29 April 2018 and adapted using the synthesis of these discussions. Members who could not attend the meeting were given the opportunity to provide input via email.


Basic definitions and statements

Four Delphi rounds were required to achieve sufficient agreement on 11 statements (see Tables 1 and 2). OA phenotypes were defined as subtypes of OA that share distinct underlying pathobiological and pain mechanisms and their structural and functional consequences. This and some of the other concepts in the statements are also summarized in Fig. 1.

Table 1 Overview of the Delphi rounds
Table 2 Final statements on OA phenotypes
Fig. 1
figure 1

Schematic overview of the general concept behind a number of the statements from the Delphi exercise

Every OA phenotype encompasses a number of typical pain and/or pathobiological mechanisms. People with OA can be assessed for the presence of one or more parameters that reflect these mechanisms. Every person can have characteristics of one or more phenotypes. Every OA phenotype relates to characteristic clinical and structural outcomes and, with that, to the effectiveness of particular interventions.

Reporting recommendations

Twelve panel members attended the face-to-face meeting. All panel members were given the opportunity to provide input via email. Reporting recommendations that followed from the panel meeting are summarized in Table 3 and discussed below. These recommendations are anticipated to be useful for authors, reviewers, and editors in the process of writing, reviewing, and publishing reports. Importantly, these criteria are not to be used as quality markers, as at this point, there are insufficient data to support any such decisions. Likewise, it was decided that there were not enough data to support the weighting of individual items. Finally, it is important to emphasize that these recommendations are in no way intended to restrict researchers in their approach to identifying OA phenotypes or determine overlaps between them.

Table 3 Reporting recommendations

General study characteristics

Currently, most datasets used for OA phenotype research are datasets from other studies or trials that are then secondarily used for phenotyping. Knowledge about the setting and characteristics of the original study are important for proper interpretation of the outcomes of the subsequent phenotype analysis. The original study goals and design will determine the contents of the dataset and the potential outcomes of the phenotype analyses. For example, opportunities for phenotype analysis in a dataset that is originated from a clinical trial will be different from analyses performed with data from an observational study. In keeping with this concept, some researchers may investigate phenotypes that differ in response to treatment, while others may explore phenotypes that differ in natural disease course.

Study population

The characteristics of the study population are important to take into account when interpreting the validity of the results of the OA phenotype analysis and for comparing results between studies. For example, the potential to identify particular phenotypes might be different between populations with and without OA pain or between patients from general practice and orthopedic clinics. Likewise, non-random subject selection or dropout in observational or interventional studies can affect the study results. For example, a phenotype that is non-responsive to a particular treatment in a clinical trial might show higher dropout rates and thereby provide less follow-up data for subsequent phenotype analysis.

Data collection

It is likely that every phenotype will be characterized by one or more parameter prognostic factors relating to one or more pathobiological or pain mechanisms that are critical for that particular phenotype. Usually, there are multiple ways to assess and/or monitor OA and these may differ importantly in their ability to reflect the pathogenetic and/or pain mechanisms of interest and in their technical characteristics (e.g., accuracy, precision, reproducibility). Therefore, it is considered important that strengths and weaknesses of the available set of parameters for the phenotype analysis are understood and discussed. For example, subchondral bone structure can be assessed by different imaging techniques and these may or may not be supplemented with biochemical markers of bone metabolism. Biochemical markers might, however, be less specific for the joint tissue of interest or be more subject to noise. The available follow-up time and number of time points may also affect the ability of parameters to differentiate between potential phenotypes. For example, biochemical markers compared with imaging markers might be more dynamic and require shorter follow-up times to show differences in disease course or treatment response between phenotypes.

Statistical analysis

The panel members considered data-driven approaches valuable for gaining insights into OA phenotypes that extend beyond current knowledge. However, data-driven approaches are often rather sensitive to changes in the particular features of the analysis (e.g., a clustering method, number of subgroups, methods to handle missing data) and selection bias. The features may be fine-tuned in an iterative process whereby outcomes are compared back and forth between settings. This process may be more or less subjective and/or be performed according to prespecified criteria. Irrespective of the approach, the clinical relevance of the potentially identified phenotypes should be accounted for in the analysis plan. For example, differences in pain course between phenotypes should be clinically meaningful. Sensitivity analyses (e.g., repeated analysis with different cutoffs) and methods to describe consistency and reproducibility (e.g., internal or external validity) are considered particularly important for data-driven techniques. Access to the dataset(s) and syntax(es) for other investigators to repeat and/or extend analyses is encouraged.


For an identified OA subgroup to be considered a distinct phenotype, the extent to which its main underlying pathobiological or pain mechanism(s) can be assumed to differ from others should be made clear. Explaining similarities and differences in relation to the existing literature may highlight consistency of findings across studies and/or point out how observable differences might have occurred. It is also advised to discuss how the identified phenotype(s) might impact future research and practice (e.g., external validity, potential therapeutic consequences).


This OA phenotype framework is intended to facilitate research on OA phenotypes and increase combined efforts to attain effective OA phenotype classification, through providing a number of coherent definitions and statements and a set of reporting recommendations that were supported by a panel of researchers with relevant expertise. The provided framework got focused around distinct pathobiological and/or pain mechanisms. This focus is in line with the ultimate goal to develop phenotype-specific interventions, targeted at these distinct mechanisms. A number of studies argue in favor of the actual existence of subgroups with distinct pathobiological and/or pain mechanisms [9,10,11,12]. Further success in this endeavor depends on the adoption of the currently proposed framework in the field. This will hopefully translate into more effective, differentiated OA management that will benefit a multitude of OA patients. Although we aimed to codify a representative set of shared opinions of individuals working in the field of OA phenotypes, we realize that insights will no doubt evolve over time and that updating the framework will likely be required in the future as the field matures and more data will become available. The ultimate success of such an initiative will require consistent and wide implementation.


A wide range of views exist on how best to operationalize the concept of OA phenotypes. The current initiative provides consensus-based definitions, statements, and reporting recommendations to the OA phenotype research field, supported by an international panel of researchers with relevant expertise. Implementation of these is considered important to standardize and synergize the wide range of research activities that are currently being deployed in this multidisciplinary field. Success in this endeavor will hopefully translate into the consistent identification of distinct phenotypes and more effective, differentiated OA management.

Availability of data and materials

Requests for data and materials relating to this publication can be submitted to the corresponding author.





  1. Deveza LA, Melo L, Yamato TP, Mills K, Ravi V, Hunter DJ. Knee osteoarthritis phenotypes and their relevance for outcomes: a systematic review. Osteoarthr Cartil. 2017;25(12):1926–41.

    Article  CAS  Google Scholar 

  2. Deveza LA, Loeser RF. Is osteoarthritis one disease or a collection of many? Rheumatology. 2018;57(suppl_4):iv34–42.

    Article  Google Scholar 

  3. Kent P, Keating JL, Leboeuf-Yde C. Research methods for subgrouping low back pain. BMC Med Res Methodol. 2010;10:62.

    Article  Google Scholar 

  4. Dell’Isola A, Allan R, Smith SL, Marreiros SS, Steultjens M. Identification of clinical phenotypes in knee osteoarthritis: a systematic review of the literature. BMC Musculoskelet Disord. 2016;17(1):425.

    Article  Google Scholar 

  5. Wenzel SE. Asthma phenotypes: the evolution from clinical to molecular approaches. Nat Med. 2012;18(5):716–25.

    Article  CAS  Google Scholar 

  6. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344–9.

    Article  Google Scholar 

  7. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

    Article  Google Scholar 

  8. Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869.

    Article  Google Scholar 

  9. Dell’Isola A, Steultjens M. Classification of patients with knee osteoarthritis in clinical phenotypes: data from the osteoarthritis initiative. PLoS One. 2018;13(1):e0191045.

    Article  Google Scholar 

  10. Guerard O, Dufort S, Forget Besnard L, Gougeon A, Carlesso L. Comparing the association of widespread pain, multi-joint pain and low back pain with measures of pain sensitization and function in people with knee osteoarthritis. Clin Rheumatol. 2020;39(3):873–9.

    Article  Google Scholar 

  11. Knoop J, van der Leeden M, Thorstensson CA, Roorda LD, Lems WF, Knol DL, et al. Identification of phenotypes with different clinical outcomes in knee osteoarthritis: data from the Osteoarthritis Initiative. Arthritis Care Res (Hoboken). 2011;63(11):1535–42.

    Article  Google Scholar 

  12. Zhang W, Likhodii S, Zhang Y, Aref-Eshghi E, Harper PE, Randell E, et al. Classification of osteoarthritis phenotypes by metabolomics analysis. BMJ Open. 2014;4(11):e006286.

    Article  Google Scholar 

Download references




M.J. Thomas is currently supported by an Integrated Clinical Academic Programme Clinical Lecturership from the National Institute for Health Research (NIHR) and Health Education England (HEE) (ICA-CL-2016-02-014). The views expressed in this manuscript are those of the author(s) and not necessarily those of the NHS, the NIHR, HEE, or the Department of Health and Social Care.

R. Christensen works at the Parker Institute, Bispebjerg and Frederiksberg Hospital (RC), which is supported by a core grant from the Oak Foundation (OCAY-13-309).

Author information

Authors and Affiliations



WEVS, SMABZ, LAD, and DJH were all involved in the conception and design of the study. All authors actively participated in the execution of the project. WEVS wrote drafts of the manuscript that were circulated for feedback and editing among all authors. All authors gave their final approval of the version of the manuscript to be submitted.

Corresponding author

Correspondence to W. E. van Spil.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

W.E. van Spil: none.

S.M.A. Bierma-Zeinstra: provides consulting advice for Infirst Healthcare.

N.K. Arden: provides consulting advice to Merck, Flexion, Freshfields Bruckhaus, Regeneron and Pfizer/Lilly.

A.-C. Bay-Jensen: employee and shareholder at Nordic Bioscience. Partner and WP lead in the IMI-APPROACH consortium.

V. Byers Kraus: none.

L. Carlesso: none.

R. Christensen: none.

L.A. Deveza: received partial reimbursement of conference registration cost from Pfizer.

M. Van Der Esch: none.

P. Kent: none.

J. Knoop: none.

C. Ladel: employee of Merck KGaA.

C.B. Little: conducts research funded by a number pharmaceutical companies through contracts negotiated with and finances controlled by the University of Sydney and/or Northern Sydney Local Health District; provides consulting advice to Galapagos Pharmaceuticas and Merck Serono.

R.F. Loeser: provides consulting advice for Unity Biotechnology and is on the Scientific Advisory Board for Reginosine. He has received research funding from Bioventis.

E. Losina: provided consulting advice for Regeneron. Receives research funding from Genentech.

K. Mills: none.

A. Mobasheri: provided consulting advice for Abbvie, Pfizer Consumer Health (PCH), Galapagos and Servier; received funding from the European Commission through the Structural and Social Funds programmes; received investigator initiated grant support from Merck KGaA.

A.E. Nelson: provides consulting advice for Glaxo Smith Kline, receives royalties from Health Press, Ltd., provided presentations for MedScape and QuantiaMD.

T. Neogi: none.

G.M. Peat: none.

A.-C. Rat: none.

M. Steultjens: none.

M.J. Thomas: none.

A.M. Valdes: none.

D.J. Hunter: provides consulting advice for Merck Serono, Tissuegene, and TLCBio.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Appendix 1.

Overview of the statements and panel scores for every Delphi round.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van Spil, W.E., Bierma-Zeinstra, S.M.A., Deveza, L.A. et al. A consensus-based framework for conducting and reporting osteoarthritis phenotype research. Arthritis Res Ther 22, 54 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: