Accuracy, patient-perceived usability, and acceptance of two symptom checkers (Ada and Rheport) in rheumatology: interim results from a randomized controlled crossover trial
Arthritis Research & Therapy volume 23, Article number: 112 (2021)
Timely diagnosis and treatment are essential in the effective management of inflammatory rheumatic diseases (IRDs). Symptom checkers (SCs) promise to accelerate diagnosis, reduce misdiagnoses, and guide patients more effectively through the health care system. Although SCs are increasingly used, there exists little supporting evidence.
To assess the diagnostic accuracy, patient-perceived usability, and acceptance of two SCs: (1) Ada and (2) Rheport.
Patients newly presenting to a German secondary rheumatology outpatient clinic were randomly assigned in a 1:1 ratio to complete Ada or Rheport and consecutively the respective other SCs in a prospective non-blinded controlled randomized crossover trial. The primary outcome was the accuracy of the SCs regarding the diagnosis of an IRD compared to the physicians’ diagnosis as the gold standard. The secondary outcomes were patient-perceived usability, acceptance, and time to complete the SC.
In this interim analysis, the first 164 patients who completed the study were analyzed. 32.9% (54/164) of the study subjects were diagnosed with an IRD. Rheport showed a sensitivity of 53.7% and a specificity of 51.8% for IRDs. Ada’s top 1 (D1) and top 5 disease suggestions (D5) showed a sensitivity of 42.6% and 53.7% and a specificity of 63.6% and 54.5% concerning IRDs, respectively. The correct diagnosis of the IRD patients was within the Ada D1 and D5 suggestions in 16.7% (9/54) and 25.9% (14/54), respectively. The median System Usability Scale (SUS) score of Ada and Rheport was 75.0/100 and 77.5/100, respectively. The median completion time for both Ada and Rheport was 7.0 and 8.5 min, respectively. Sixty-four percent and 67.1% would recommend using Ada and Rheport to friends and other patients, respectively.
While SCs are well accepted among patients, their diagnostic accuracy is limited to date.
DRKS.de, DRKS00017642. Registered on 23 July 2019
The European League Again Rheumatism (EULAR) recommendations support that patients with arthritis should be seen as early as possible, ideally during 6 weeks after symptom onset , since an early start of the treatment significantly improves patient outcomes . Various strategies have been identified [3, 4] to implement these recommendations; however, the diagnostic delay seems to increase despite such strategies [5, 6].
Symptom checkers (SCs) could improve this situation. SCs are patient-centered diagnostic decision support systems (DDSS) that are designed to offer a scalable, objective, cost-effective, personalized triage strategy. Based on such a triage strategy, SCs should help to receive a more appropriate appointment, for the right patient, at the right time, thus empowering patients. It is known that patients with rheumatic and musculoskeletal diseases (RMD) are highly motivated to use SCs and other medical apps . Thus, SCs like the artificial intelligence-driven Ada have been used to complete more than 15 million health assessments in 130 countries .
To ensure the safety and efficacy of such apps, EULAR recently published guidelines  that state “self-management apps should be up to date, scientifically justifiable, user-acceptable, and evidence-based where applicable,” and validation should include people with RMDs.
Therefore, the aim of this study was to create real-world-based evidence by evaluating the diagnostic accuracy, usability, acceptance, and completion time of two free, publicly available SCs, Ada (www.ada.com) and Rheport (www.rheport.de).
We present interim results of a randomized controlled crossover multicenter study, conducted at three centers in Germany. The study was approved by the ethics committee of the Medical Faculty of the University of Erlangen-Nürnberg, Germany (106_19 Bc), reported to the German Clinical Trials Register (DRKS) (DRKS00017642) and conducted in compliance with the Declaration of Helsinki. All patients provided written informed consent before participating. Patients were randomized 1:1 to group 1 (completing Ada first, continuing with Rheport) or group 2 (completing Rheport first, continuing with Ada) by computer-generated block randomization whereas each block contains n = 100 patients. SCs were completed before the regular appointment. Assisting personnel was present to help with SC completion if necessary.
Adult patients newly presenting to the first (University Hospital Erlangen, Germany) of three recruiting rheumatology outpatient clinics with musculoskeletal symptoms and unknown diagnosis were included in this cross-sectional study. Patients with a known diagnosis and patients unwilling or unable to comply with the protocol were excluded from the study. Besides the app-related data outlined below, demographic variables, symptom duration, swollen and tender joint count, DAS28 score, ESR, CRP, anti-CCP antibody and rheumatoid factor status, and clinical diagnosis using established classification criteria were recorded. This interim analysis is based on patient data from rheumatology outpatient clinics recorded starting in September 2019 up to February 2020.
Description of the symptom checkers
Ada is a Conformité Européenne (CE)-certified medical app that is freely available in multiple languages and was used to complete more than 15 million health assessments in 130 countries . The artificial intelligence-driven chatbot app first asks for basic health information (e.g., sex, smoking status) and then asks for the current leading symptoms. The questions (Fig. 1) are dynamically chosen, and the total number varies depending on the previous answers given. Ada then provides a top (D1) and up to 5 concrete disease suggestions (D5), their probability and urgency advice. The app is based on constantly updated research findings and is not limited to RMDs.
Rheport is a rheumatology-specific online platform that uses a fixed patient questionnaire (Fig. 1) including basic health information and rheumatology-specific questions, developed by rheumatologists. A background algorithm calculates the probability of an IRD based on a weighted sum score of the questionnaire answers. A sum score ≥ 1.0 was determined to be the threshold for an IRD. The system is already used in clinical routine to triage appointments of new patients per IRD probability. About 3000 appointments have been organized to date . For this study, an app-based version of the software has been used. Both SCs were tested using three iOS-based tablets.
The primary outcome was the diagnostic accuracy regarding the sensitivity and specificity of Ada and Rheport concerning the diagnosis of IRD. The results of the SCs were recorded and compared to the gold standard, i.e., the final physicians’ diagnosis; reported on the discharge summary report; and adjudicated by the head of the local rheumatology department.
SC completion time and patient-perceived usability were secondary outcomes of this study. SC completion time was measured by supervising local study personnel. Patients completed a survey evaluating the SC usability using the System Usability Scale (SUS) . It consists of 10 statements with 5-point Likert scales ranging from strongly agree to strongly disagree, resulting in a maximum score of 100. Finally, patients were asked if they would recommend the two SCs to friends and other patients.
We performed an interim analysis of the first 164 patients who completed the study. The analysis consisted of (i) a descriptive sample characterization stratified by randomization arm, (ii) an assessment of Ada’s and Rheport’s diagnostic accuracy, and (iii) a descriptive evaluation of the secondary outcome measures specified above for the total sample. Descriptive characteristics for each randomization arm are presented as median (Mdn) and interquartile range (IQR) for interval data and as absolute (n) and relative frequency (percent) for nominal data. Comparability of demographic and IRD-related characteristics between the randomization groups was assessed by the Wilcoxon rank-sum tests and χ2 tests. Diagnostic accuracy was evaluated referring to sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and overall accuracy. The comparability of the secondary outcomes was evaluated by the Wilcoxon signed-rank tests whereas descriptive information is presented as Mdn (IQR). The significance level for inferential tests was set at p ≤ 0.05. The software used for the statistical analysis was R (version 3.6.3) and RStudio (version 1.2.5033), respectively.
Sample size determination
A minimum sample size of n = 122 patients was calculated, based on the following assumptions: (1) prevalence, defined as the proportion of subjects who, after presenting to the rheumatologist, are diagnosed with an inflammatory rheumatic disease of 40% ; (2) average diagnostic accuracy of previous applications for diagnosis using the 3 most likely diagnoses of 50% ; (3) desired accuracy of diagnosis using Ada or Rheport in terms of sensitivity and specificity of 70%; (4) type 1 error: discrete value according to Bujang and Adnan  of 4.4%; (5) type 2 error: discrete value according to Bujang and Adnan  of 19% and test strength (power) corresponding to 81%.
A total of 211 consecutive patients were approached, 167 agreed to participate, and 164 patients were included in the interim analysis presented (Fig. 2). 32.9% (54/164) of the presenting patients were diagnosed with an IRD based on the physicians’ judgment. The classified diagnosis and demographic characteristics are summarized in Tables 1 and 2, respectively.
Rheport showed a sensitivity of 53.7% (29/54) and a specificity of 51.8% (57/110). Ada’s D1 and D5 suggestions showed a sensitivity of 42.6% (23/54) and 53.7% (29/54) and a specificity of 63.6% (70/110) and 54.5% (60/110) concerning IRD, respectively. The diagnostic accuracy in the two randomization arms seemed to be similar regarding the individual characteristics of diagnostic accuracy. Further details on the SCs’ diagnostic accuracy can be taken from Table 3. The correct diagnosis of the IRD patients was within Ada’s D1 and D5 suggestions in 16.7% (9/54) and 25.9% (14/54), respectively. The 14 correct ADA D5 disease suggestions encompassed the following diagnosis: 5 PsA, 4 SpA, 3 RA, 2 PMR, and one connective tissue disease (systemic sclerosis) cases.
The median completion time for Ada and Rheport was 7.0 min (IQR 5.8–9.0) and 8.5 (IQR 8.0–10.0), respectively. On a scale of 0 (worst) to 100 (best), the median SUS of Ada and Rheport was 75.0 (IQR 62.5–85.0) and 77.5 (IQR 62.5–87.5), respectively. Completion time and usability (SUS scores) were not different between the two groups. Sixty-four percent and 67.1% would recommend using Ada and Rheport to friends and other patients, respectively.
This prospective real-world study highlights the currently limited diagnostic accuracy of SCs, such as Ada and Rheport with respect to IRDs. Their overall sensitivity and specificity for IRDs are moderate. SCs offer patients on-demand medical support independent of time and place. An automated SC-based triage, as offered by Rheport, may allow objective, scalable, and transparent decisions. By automating triage decisions, SCs could additionally save money [12, 14] and accelerate the time to correct diagnosis , however may also lead to over-diagnosis and over-treatment .
Despite increasing patient usage , evidence supporting SC effectiveness is limited to date [12, 17]. The results of this study are in line with previous SC analyses [12, 17, 18]. Research supported by Ada Health GmbH shows that Ada had the highest top 3 suggestion diagnostic accuracy (70.5%) compared to other SCs , and the correct condition was among the first three results in 83% in an Australian assessment study . Similarly to our results, the majority of patients would recommend Ada (85.3%) to friends or relatives .
The first rheumatology-specific SC study with 34 patients  showed that only 4 out of 21 patients with inflammatory arthritis were given the first diagnosis of RA or PsA. Proft et al. recently showed that a physician-based referral strategy was more effective than an online self-referral tool for early recognition of axial spondyloarthritis . Nevertheless, these authors recommend using online self-referral tools in addition to traditional referral strategies, as the proportion of axial spondyloarthritis among self-referred patients (19.4%) was clearly higher than the assumed 5% prevalence in patients with chronic back pain. Regarding the current referral sensitivity of 32.9%, complementary SC integration might indeed be part of modern rheumatology.
The diagnostic accuracy of rheumatologists is high based on the comprehensive use of information from patients’ history, symptoms, and also data from laboratory tests and imaging . Therefore, the current comparison of the physicians’ final diagnosis and SC-suggested diagnosis should be interpreted carefully, as the SC diagnosis is based on substantially less data. Furthermore, patients could discuss SC results with their rheumatologists, possibly influencing the rheumatologist’s diagnosis. The sequential usage of both SCs represents a possible bias, as patients might be influenced by the usage of the first SC. However, we could not observe any significant differences related to SC order. The slightly better performance of Ada should be interpreted carefully. In contrast to Rheport, Ada is supported by artificial intelligence and does not use a fixed questionnaire. Ada covers a great variety of different conditions  and is not limited to IRDs, whereas Rheport is exclusively meant for the triage of new suspected IRD patients. The study setting was deliberately chosen risk-adverse, so the use of the SCs did not have any clinical implications. Symptom checkers are however designed to be used in community settings, where the probability that a patient will have an IRD is much lower than in a rheumatology clinic and no help for SC completion is available. Furthermore, the exact SC diagnosis might be less important than the SC advice on when to see a doctor, especially in emergency situations. Our study setting caused a much higher a priori chance of having an IRD, as patients were already “screened” by referring physicians. The high proportion of PsA and AxSpA patients is likely attributed to a strong local cooperation with the orthopedic and dermatology department. Additional data from the other two centers will hopefully contribute to balancing results. We did not measure how often help from assisting personnel was necessary for SC completion.
To the best of our knowledge, this is the first prospective, real-world, multicenter study evaluating two currently used SCs in rheumatology. Our results may provide some help to guide and inform patients, treating health care professionals (HCPs) but also other stakeholders in health care. In conclusion, while SCs are well-accepted by patients their diagnostic accuracy is limited. Constant improvement of algorithms might foster the future potential of SCs to improve patient care.
Availability of data and materials
Data are available on reasonable request from the corresponding author on reasonable request.
Diagnostic decision support system
European League Again Rheumatism
Inflammatory rheumatic disease
Rheumatic and musculoskeletal diseases
System Usability Scale
Combe B, Landewe R, Daien CI, Hua C, Aletaha D, Álvaro-Gracia JM, Bakkers M, Brodin N, Burmester GR, Codreanu C, Conway R, Dougados M, Emery P, Ferraccioli G, Fonseca J, Raza K, Silva-Fernández L, Smolen JS, Skingle D, Szekanecz Z, Kvien TK, van der Helm-van Mil A, van Vollenhoven R. 2016 update of the EULAR recommendations for the management of early arthritis. Ann Rheum Dis. 2017;76(6):948–59. https://doi.org/10.1136/annrheumdis-2016-210602.
Quinn MA, Emery P. Window of opportunity in early rheumatoid arthritis: possibility of altering the disease process with early intervention. Clin Exp Rheumatol. 2003;21(0392-856X (Print)):154–7.
Villeneuve E, Nam JL, Bell MJ, Deighton CM, Felson DT, Hazes JM, IB MI, Silman AJ, Solomon DH, Thompson AE, White PHP, et al. A systematic literature review of strategies promoting early referral and reducing delays in the diagnosis and management of inflammatory arthritis. Ann Rheum Dis. 2012;72(1468–2060 (Electronic)):13–22.
Benesova K, Lorenz HM, Lion V, Voigt A, Krause A, Sander O, Schneider M, Feuchtenberger M, Nigg A, Leipe J, Briem S, Tiessen E, Haas F, Rihl M, Meyer-Olson D, Baraliakos X, Braun J, Schwarting A, Dreher M, Witte T, Assmann G, Hoeper K, Schmidt RE, Bartz-Bazzanella P, Gaubitz M, Specker C. Früh- und Screeningsprechstunden: Ein notwendiger Weg zur besseren Frühversorgung in der internistischen Rheumatologie? Z Rheumatol. 2019;78(8):722–42. https://doi.org/10.1007/s00393-019-0683-y.
Raza K, Stack R, Kumar K, Filer A, Detert J, Bastian H, Burmester GR, Sidiropoulos P, Kteniadaki E, Repa A, Saxne T, Turesson C, Mann H, Vencovsky J, Catrina A, Chatzidionysiou A, Hensvold A, Rantapää-Dahlqvist S, Binder A, Machold K, Kwiakowska B, Ciurea A, Tamborrini G, Kyburz D, Buckley CD. Delays in assessment of patients with rheumatoid arthritis: variations across Europe. Ann Rheum Dis. 2011;70(10):1822–5. https://doi.org/10.1136/ard.2011.151902.
Stack RJ, Nightingale P, Jinks C, Shaw K, Herron-Marx S, Horne R, Deighton C, Kiely P, Mallen C, Raza K. Delays between the onset of symptoms and first rheumatology consultation in patients with rheumatoid arthritis in the UK: an observational study. BMJ Open. 2019;9(3):e024361. https://doi.org/10.1136/bmjopen-2018-024361.
Knitza J, Simon D, Lambrecht A, Raab C, Tascilar K, Hagen M, Kleyer A, Bayat S, Derungs A, Amft O, et al. Mobile health in rheumatology: a patient survey study exploring usage, preferences, barriers and eHealth literacy. JMIR mHealth uHealth. 2020;8(8):e19661.
Ada Health built an AI-driven startup by moving slowly and not breaking things [https://techcrunch.com/2020/03/05/move-slow-and-dont-break-things-how-to-build-an-ai-driven-startup/].
Najm A, Nikiphorou E, Kostine M, Richez C, Pauling JD, Finckh A, Ritschl V, Prior Y, Balážová P, Stones S, Szekanecz Z, Iagnocco A, Ramiro S, Sivera F, Dougados M, Carmona L, Burmester G, Wiek D, Gossec L, Berenbaum F. EULAR points to consider for the development, evaluation and implementation of mobile health applications aiding self-management in people living with rheumatic and musculoskeletal diseases. RMD Open. 2019;5(2):e001014. https://doi.org/10.1136/rmdopen-2019-001014.
Brooke J. SUS - a quick and dirty usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland AL, editors. Usability evaluation in industry, vol. 194. London: Taylor and Francis; 1996. p. 189–194.
Feuchtenberger M, Nigg AP, Kraus MR, Schafer A. Rate of proven rheumatic diseases in a large collective of referrals to an outpatient rheumatology clinic under routine conditions. Clin Med Insights Arthritis Musculosskelet Diord. 2016;9(1179–5441 (Print)):181–7.
Semigran HL, Linder JA, Gidengil C, Mehrotra A. Evaluation of symptom checkers for self diagnosis and triage: audit study. BMJ (Clinical research ed). 2015;351:h3480.
Bujang MA, Adnan TH. Requirements for minimum sample size for sensitivity and specificity analysis. J Clin Diagn Res. 2016;10(10):YE01–6. https://doi.org/10.7860/JCDR/2016/18129.8744.
Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. 2020;3(1):17.
Ronicke S, Hirsch MC, Turk E, Larionov K, Tientcheu D, Wagner AD. Can a decision support system accelerate rare disease diagnosis? Evaluating the potential impact of Ada DX in a retrospective study. Orphanet J Rare Dis. 2019;14(1):69. https://doi.org/10.1186/s13023-019-1040-6.
Landewé RBM. Overdiagnosis and overtreatment in rheumatology: a little caution is in order. Ann Rheum Dis. 2018;77(10):1394–6. https://doi.org/10.1136/annrheumdis-2018-213700.
Chambers D, Cantrell AJ, Johnson M, Preston L, Baxter SK, Booth A, Turner J. Digital and online symptom checkers and health assessment/triage services for urgent health problems: systematic review. BMJ Open. 2019;9(8):e027743. https://doi.org/10.1136/bmjopen-2018-027743.
Powley L, McIlroy G, Simons G, Raza K. Are online symptoms checkers useful for patients with inflammatory arthritis? BMC Musculoskelet Disord. 2016;17(1):362. https://doi.org/10.1186/s12891-016-1189-2.
Gilbert S, Mehl A, Baluch A, Cawley C, Challiner J, Fraser H, Millen E, Montazeri M, Multmeier J, Pick F, Richter C, Türk E, Upadhyay S, Virani V, Vona N, Wicks P, Novorol C. How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open. 2020;10(12):e040269. https://doi.org/10.1136/bmjopen-2020-040269.
Gilbert S, Upadhyay S, Novorol C, Wicks P. The quality of condition suggestions and urgency advice provided by the Ada symptom assessment app assessed with independently generated vignettes optimized for Australia. medRxiv. 2020:2020.2006.2016.20132845.
Miller S, Gilbert S, Virani V, Wicks P. Patients’ utilization and perception of an artificial intelligence–based symptom assessment and advice technology in a British primary care waiting room: exploratory pilot study. JMIR Hum Factors. 2020;7(3):e19713. https://doi.org/10.2196/19713.
Proft F, Spiller L, Redeker I, Protopopov M, Rodriguez VR, Muche B, Rademacher J, Weber A-K, Lüders S, Torgutalp M, Sieper J, Poddubnyy D. Comparison of an online self-referral tool with a physician-based referral strategy for early recognition of patients with a high probability of axial spa. Semin Arthritis Rheum. 2020;50(5):1015–21. https://doi.org/10.1016/j.semarthrit.2020.07.018.
Ehrenstein B, Pongratz G, Fleck M, Hartung W. The ability of rheumatologists blinded to prior workup to diagnose rheumatoid arthritis only by clinical assessment: a cross-sectional study. Rheumatology (Oxford). 2018;57(1462–0332 (Electronic)):1592–601.
The present work was performed to fulfill the requirements for obtaining the degree “Dr. med.” for J. Mohn and is part of the PhD thesis of the first author JK (AGEIS, Université Grenoble Alpes, Grenoble, France). We thank Franziska Fuchs for her help recruiting the patients.
This study was supported by Novartis Pharma GmbH, Nürnberg, Germany (grant number: 33419272). Open Access funding enabled and organized by Projekt DEAL.
Ethics approval and consent to participate
The study was approved by the ethics committee of the Medical Faculty of the University of Erlangen-Nürnberg, Germany (106_19 Bc), and reported to the German Clinical Trials Register (DRKS) (DRKS00017642).
Consent for publication
JK has received research support from Novartis Pharma GmbH. Qinum and RheumaDatenRhePort developed and hold rights for Rheport. WV, CD, SK, PB, and MW are members of RheumaDatenRhePort. AF, WV, CD, and PB were involved in the development of Rheport. JK is a member of the scientific board of RheumaDatenRhePort.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Knitza, J., Mohn, J., Bergmann, C. et al. Accuracy, patient-perceived usability, and acceptance of two symptom checkers (Ada and Rheport) in rheumatology: interim results from a randomized controlled crossover trial. Arthritis Res Ther 23, 112 (2021). https://doi.org/10.1186/s13075-021-02498-8
- Symptom checker