Prediction of knee osteoarthritis progression using radiological descriptors obtained from bone texture analysis and Siamese neural networks: data from OAI and MOST cohorts

Background Trabecular bone texture (TBT) analysis has been identified as an imaging biomarker that provides information on trabecular bone changes due to knee osteoarthritis (KOA). In parallel with the improvement in medical imaging technologies, machine learning methods have received growing interest in the scientific osteoarthritis community to potentially provide clinicians with prognostic data from conventional knee X-ray datasets, in particular from the Osteoarthritis Initiative (OAI) and the Multicenter Osteoarthritis Study (MOST) cohorts. Patients and methods This study included 1888 patients from OAI and 683 patients from MOST cohorts. Radiographs were automatically segmented to determine 16 regions of interest. Patients with an early stage of OA risk, with Kellgren and Lawrence (KL) grade of 1 < KL < 4, were selected. The definition of OA progression was an increase in the OARSI medial joint space narrowing (mJSN) grades over 48 months in OAI and 60 months in MOST. The performance of the TBT-CNN model was evaluated and compared to well-known prediction models using logistic regression. Results The TBT-CNN model was predictive of the JSN progression with an area under the curve (AUC) up to 0.75 in OAI and 0.81 in MOST. The predictive ability of the TBT-CNN model was invariant with respect to the acquisition modality or image quality. The prediction models performed significantly better with estimated KL (KLprob) grades than those provided by radiologists. TBT-based models significantly outperformed KLprob-based models in MOST and provided similar performances in OAI. In addition, the combined model, when trained in one cohort, was able to predict OA progression in the other cohort. Conclusion The proposed combined model provides a good performance in the prediction of mJSN over 4 to 6 years in patients with relevant KOA. Furthermore, the current study presents an important contribution in showing that TBT-based OA prediction models can work with different databases.

phenotypes [27] and the wide variability in the trajectory of disease progression [12], it is of the utmost importance to identify KOA patients who have a greater potential of progressing more rapidly.
Therefore, it is relevant to develop imaging biomarkers that can help the emergence of new therapeutic treatments and particularly new disease-modifying drugs. Due to the role of the subchondral bone and its remodeling status in KOA progression, texture analysis and tibial subchondral bone mineral density assessments are recognized and established methods to characterize structural alterations associated with KOA [18]. Recently, using the OAI database, the predictive ability of baseline trabecular bone texture to distinguish patients with or without radiographic progression was slightly improved compared to that of conventional clinical risk factors such as age, gender, body mass index (BMI), and joint space width (JSW) [10,15]. Previously published studies have shown only moderate performance for predicting KOA progression when using pain, race, and previous knee injury [8,17] as predictor factors. However, since data for pain, race, and previous knee injury were available in both OAI and MOST cohorts, we evaluated the performance of our proposed models with these three additional clinical predictors.
In parallel with the improvement in medical imaging technologies, several machine learning techniques have been proposed for the diagnosis and prediction of KOA [14,26]. Automatic KOA diagnosis is becoming increasingly popular [4,23,26] as it has a high potential to complement the OA diagnostic chain and make radiographic KOA grading more objective.
The aims of this study were twofold: (i) to evaluate the predictive ability of a combined approach using both trabecular bone texture (TBT) descriptors, calculated by a variogram-based method [9,10], and radiological gravity scores, calculated by deep learning-based Siamese CNN tools [26], to predict KOA progression; (ii) to study the use of the same KOA progression prediction model validated on independent OA cohort datasets (OAI and MOST), by training the model on one dataset and testing it on the other, and vice versa. The TRIPOD checklist (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) was used as a framework of quality assurance of the present manuscript [22].

Patients
In this study, the data used in the preparation were obtained from the OAI and the MOST databases. Details about the acquisition and grading protocols in the OAI and the MOST studies are available online at https:// nda. nih. gov/ oai and http:// most. ucsf. edu, respectively. The primary selected dataset included only the knee images of patients with available KL grades [13] and the Osteoarthritis Research Society International (OARSI) grades as well as the clinical covariates: age, gender, BMI, Western Ontario and McMaster universities osteoarthritis index (WOMAC) pain, race, and history of knee injury. From the selected dataset, the knees with preexisting OA with 2 ≤ KL < 4 [5,10] at baseline were considered in the present study, in accordance with the European Medicines Agency [3] which recommended to include patients with KL radiographic entry criteria of grades 2 or 3 for studies of structure-modifying drugs. The selected dataset was divided into two sub-datasets according to the type of acquisition modality: the computed radiographs (CR), i.e., digital images acquired by a device using X-ray-sensitive plates which are then read by a processor, and the digitized X-ray films (RG).
In order to evaluate the effect of the quality of images on the performance of the predictive models, each of the two datasets (CR and RG) was further divided into two groups according to the quality of the corresponding radiographs. In the first non-quality-controlled (nonQC) group, all radiographs were included except those showing materials (such as metallic materials, prostheses, and screws) in the subchondral zone, whereas in the second quality-controlled (QC) group, exclusion criteria also included radiographs with exposure problems (Fig. 1) in addition to those imposed in the nonQC group. The aim of this exclusion was to avoid the disturbances of these artifacts in the calculation of TBT parameters. This grouping strategy led to four sub-datasets, namely QC-CR, nonQC-CR, QC-RG, and nonQC-RG. As a result of the inclusion/exclusion criteria previously described, 2740 knees (425 cases, OAI) and 845 knees (297 cases, MOST) were judged as eligible for this study. Figure 2 shows the number of subjects and knees for each sub-dataset. The characteristics of selected OA cases and controls are summarized in Tables 1 and 2 for OAI and MOST cohorts, respectively.

Definition of OA progression
Patients with or without OA progression were selected using the following definitions: OA progressors (cases) included patients with non-severe KOA (KL grade 2 ≤ KL ≤ 3) at baseline and with an increased mJSN grade (ΔmJSN >0) over the predefined control period (48 months and 60 months for OAI and MOST cohorts, respectively. ΔmJSN denotes the difference between OARSI mJSN grades at baseline and check points. OA non-progressors (controls) included patients with nonsevere KOA at baseline and a constant mJSN grade (ΔmJSN = 0) over the predefined control period.

Regions of interest (ROI)
A patchwork construction technique using a semi-automatic method to extract the ROIs has been previously described [10]. In the current study, in order to extract the trabecular bone ROIs, a fully automatic approach, thanks to the BoneFinder [19] software, was used to delimit the femoral and tibial bone edges. The patchwork consists of 16 ROIs mapping the whole tibial trabecular area (Fig. 3). Our algorithm firstly uses BoneFinder to identify the rough position of the bone in the image and then outline 148 points of the tibial and femoral contours. For the left knee, points 48 and 64 mark the lateral and medial extremities of the tibia, respectively. For the right knee, the medial extremities of the tibia are identified by the points 122 and 138, respectively. Secondly, the algorithm approximates the tibial subchondral baseline as the line going through these anatomical points. Thirdly, this line is used to determine the orientation and size of the 16-ROI patchwork under the cortical plates. The square ROI dimensions were proportional to the knee width defined as the distance between the outer tibial margins. In our sub-dataset, radiographs presented different pixel spacing ranging from 0.1 to 0.2 mm, and the average ROI side length was 73 ± 18 pixels (10.1 ± 0.9 mm), ranging from 7 to 13 mm.

Texture analysis
Fractal analysis consists in assigning a fractal dimension (FD) and other fractal characteristics to a dataset [11]. Several methods have been developed to measure the FD of a signal including the well-known technique of fractal signature analysis (FSA) [20], the Whittle estimator (WhE) [7], and the quadratic variation method (VAR) [9,10]. These three different fractal analysis methods provided consistent results in their capacity to predict OA progression [10]. In the current study, the VAR method, used by Janvier et al. [9], was retained for our experiments.
As reported earlier [9], the cut-off scale was observed around 500 mm on the empirical variograms and two fractal parameters were extracted: μFD and mFD corresponding to the texture complexity computed for the two micro (μ-scale) and the milli (m-scale) scales

KL grading using Siamese neural networks
A Siamese neural network (SNN)-based method proposed by Tiulpin et al. [26] was used to estimate the probability distribution of the KL grades of baseline radiographs included in our study, in the objective to propose a fully automatic KOA progression prediction model. An SNN is a class of neural network architectures that contain two or more subnetworks sharing the same configuration. SNNs are known to be robust to class imbalance, which is usually the case in medical applications [2,21]. A full description of the used SNN-based method can be found in [25,26].

Statistical analysis
Logistic regression was used to predict KOA progression. Several statistical models were developed involving not only clinical covariates and radiological scores but also TBT-based parameters: where lJSN denotes lateral joint space narrowing, cov denotes the traditional clinical covariates (age, gender, and BMI), and covPlus denotes the cov parameters accompanied with additional clinical data (race, WOMAC pain, and history of injury).
The TBT-CNN model (Model_8) includes baseline TBT, KLprob, lJSN, and cov. The KLprob was computed as the linear combination of the five probabilities of the KL grades predicted by the CNN-based model. Flowchart illustrating the selection of study subjects from the OAI and MOST datasets (n is the number of patients and k is the corresponding number of knee radiographs at baseline). From the OAI and MOST initial database, selected sub-datasets were considered, including OA patients with 2 ⩽ KL<4 at baseline, with and without quality control conditions   In Model_8, the mJSN was not included, due to the high correlation between the baseline mJSN and KLprob grades.
To avoid overfitting problems, all the models were evaluated using a 10-fold cross-validation repeated 300 times. Each model was evaluated using the AUC of the receiver operating characteristic (ROC) as a global performance criterion. The model classification accuracy (ACC), the probability that a random example is correctly classified, was also computed to investigate the relevance of different models. An ACC is defined as the ratio of the number of correct predictions relative to the total number of predictions.
All statistical analyses were performed using the R Statistical tool (version 3.6.3) including the packages MASS (for stepwise AIC optimization), Caret (for the crossvalidation training), and the pROC (for pROC curves and comparisons). Comparisons between the models were based on the ROC curves using the Delong method [6].
In order to reduce the number of parameters before training the prediction models, a backward selection of the TBT parameters (64 variables) was automatically performed using the Akaike Information Criterion (AIC) [1] as an iterative criterion. At each iteration, the AIC removes one parameter and preserves the most efficient parameter(s) to limit overfitting effects.

Performance comparison
The cov and the JSN scores at baseline are presented in Table 1 for the OAI dataset and in Table 2 for the MOST dataset. The ROC curves of the 8 models were calculated using data from nonQC-OAI and MOST sub-datasets (Fig. 4). The models' AUC values using all considered sub-datasets are summarized in Table 3.
In OAI and MOST datasets, Model_1 was not predictive of OA progression (AUC < 0.6). The combination of cov with TBT or KLprob (Model_2 or Model_5 respectively) improved the prediction to a level comparable with that obtained by the combination of cov with JSN (Model_3).
In the MOST dataset, Model_2 was predictive of JSN progression (AUC≥0.74) and significantly better than Model_4 which combines cov and baseline KL (AUC≤0.65); Model_2 outperformed Model_5 (p = 0.021); Model_2 significantly improved the prediction compared to Model_3, especially in the RG subset (p = 0.017); and Model_3 significantly outperformed Model_5 (p < 0.03) in all scenarios regardless of the acquisition modality and image quality.
Model_5 showed a significantly better AUC than Model_4, in all cases (p < 0.02 in OAI and p < 0.03 in MOST datasets). Model_7 achieved a similar performance with AUCs up to 0.75 (p > 0.2) in the OAI dataset and up to 0.80 (p > 0.05) in the MOST dataset. The AUCs of Model_7 were significantly better than those of Model_3, especially in the OAI RG subset (p < 0.004) and in the MOST dataset (p < 0.02). Model_6, which combines cov, JSN, and TBT, previously proposed by Janvier et al. [10], achieved a similar performance.  In all different scenarios, the proposed TBT-CNN model (Model_8) significantly improved the AUC com-   With the additional clinical covariates (race, WOMAC pain, and history of injury) used in Model_9, the results showed no improvement on the prediction performance compared to the proposed model (Model_8) in both OAI and MOST datasets.

Performance comparison with respect to acquisition modality
In terms of the acquisition modality, no significant differences in AUCs of the 8 models were found with regard to the three different scenarios (CR, RG, and CR&RG) (p > 0.1), in both OAI and MOST datasets.

Performance comparison with respect to image quality
Results showed that the image quality (QC and nonQC) had no statistically significant effect on the performance of the 8 models (p > 0.2) in the OAI dataset and (p > 0.4) in the MOST dataset. Thus, quality control is not a discriminating determinant of KOA progression prediction.

The prediction performance of models trained on one dataset and tested in another dataset
Model_8 was tested in two scenarios. In the first scenario, the model was trained on the OAI dataset. The trained model was then used for the prediction of OA progression in the MOST dataset. In the second scenario, the model was trained on the MOST datasets. The trained model was then used for the prediction of OA progression in the OAI dataset. Results showed the ability of this model trained on one cohort to predict progression in the other cohort with AUC > 0.7 in the CR and CR&RG cases, whatever the  quality of the radiographs (Table 4). However, the model trained in the RG subset did not achieve the same performance (AUC < 0.7).

Discussion
An important contribution of this study consists in showing that OA prediction models can work with different databases. To the best of our knowledge, the present study is the first to evaluate the capability of combined models, including TBT and CNN-based parameters, to predict KOA progression, in both OAI and MOST datasets. The TBT-CNN model consistently provided the best performance in comparison with the other models [15,16,26] not only when training and testing on the same cohort (with AUC up to 0.81) but also when training on one cohort (OAI or MOST) and testing on the other one (MOST or OAI). When testing on another cohort, the TBT-CNN model was always predictive particularly in the CR and CR&RG subsets (AUC ≥ 0.7), which was not the case for the other models.
Our study also included an evaluation of the effect of different acquisition modalities and image qualities on the performance of our combined prediction models.
The TBT-CNN model significantly outperformed the other models, regardless of the quality of the images considered, especially with complete selected OAI and MOST datasets (Fig. 4). The same results were obtained when using the QC-and nonQC-CR sub-datasets of the OAI cohort and the nonQC-RG sub-dataset of the MOST cohort.
The AUC of the TBT-CNN model varied from 0.73 to 0.75 in OAI and from 0.78 to 0.81 in MOST, whereas the AUC of the cov-JSN model achieved a maximum AUC of 0.71 in OAI and 0.75 in MOST (Table 3).
In both cohorts, the results showed that the performance of the TBT-CNN model was invariant with respect to acquisition modality and image quality. Moreover, results showed that the model prediction performance was better when using CNN-based estimations of KL than those measured manually by radiologists in the OAI and MOST datasets. In addition, the performance of the proposed prediction model remained unchanged when adding more clinical data including race, WOMAC pain, and history of injury. Whatever the cohort, the modality of the radiographs, and the quality of the radiographs, the CNN-based estimation of KL grades provided better results than those obtained from a discrete ordinal grading method. An automatic estimation of JSN grades using a CNN-based method [25] might also be of interest to improve the prediction of OA progression.
However, the performance of the prediction model using CNN-based estimations was statistically less significant than when using TBT parameters in the MOST dataset. The performance of the two approaches was similar in the OAI dataset.
Previous studies have demonstrated that the texture analysis of subchondral bone from conventional knee radiographs could be a good indicator of the prediction of knee OA progression [10,15,16,28,29].
In a recent study by Kraus et al. [15], the use of TBT calculated by the FSA method in combination with other clinical covariates and radiological parameters was investigated to propose a predictive model of OA progression using a large sample of 579 RG&CR radiographs selected from the OAI cohort. They investigated not only the radiographic but also the knee pain progression status over 12 and 24 months. However, the performance of the proposed model was modest (AUC = 0.633 − 0.649).
Involving a much larger dataset of 1124 CR radiographs, Janvier et al. [10] proposed a prediction model that included JSN grades in addition to TBT and cov parameters. In their study, the TBT analysis covered the medial and lateral subchondral bone. This model showed the ability to predict OA progression over 48 months, providing an AUC score of 0.77 using the WhE estimator for the TBT parameters.

Strengths and limitations
Due to a lack of information in the MOST cohort regarding the JSW, our study took into consideration only the discrete ordinal JSN grades. It would be interesting to consider the use of the continuous JSW values or joint space area (JSA), for which an additional step is required to calculate these values from the selected radiographs.
In the current study, age, sex, and BMI were chosen as clinical predictors. Other predictors of KOA progression such as self-reported previous knee injury and knee pain may also be included in future studies. However, the main focus of this study was to show the ability of image-processing-based models to predict KOA progression, rather than investigating other clinical covariates for KOA progression prediction.
It should be noticed that the duration of the two tested cohorts is not the same (48 months for OAI and 60 months for MOST). Unfortunately, the OAI cohort did not include imaging data at 60-month follow-up, and the MOST cohort did not include imaging data at 48 months. Consequently, the use of time-to-event data analyses was not relevant since the occurrence of KOA progression is more or less a continuous phenomenon. It has been shown, however, that our proposed models provide a good performance in the prediction of KOA progression when trained on one cohort and tested on the other.
The present study has several important strengths. It involves the use of two large datasets. In addition, the proposed model takes advantage of an extensive set of TBT parameters [9,10] and CNN-based KL grades for the prediction of OA progression. We also evaluated the effect of different image quality and modality scenarios on the performance of the prediction of OA progression. A major contribution of our study is the evaluation using a model trained on one cohort and validated on the other. In this case, the progression prediction models were not only trained on the OAI dataset and tested on the MOST dataset, as proposed by Tiulpin et al. [24], but also trained on the MOST dataset and tested on the OAI dataset, which has never been explored to date. Furthermore, the combination of TBT and CNN-based estimation of KL grades significantly improves the prediction of OA progression. This combination provides mutual information between the evolution of shape surrounding the knee joint space [24][25][26] and texture variations in the proximal tibial subchondral bone.

Conclusions
In conclusion, our study has demonstrated the feasibility of using the TBT-CNN model to predict mJSN progression in both OAI and MOST cohorts. This model exhibited a good diagnostic performance regardless of both the acquisition modality and the image quality when the model was trained and tested on the same cohort. Moreover, when trained on one cohort, the TBT-CNN model was able to predict mJSN progression on another cohort in the CR and CR&RG subsets, irrespective of the image quality.
However, further experiments are needed to develop more comprehensive risk assessment models for KOA progression prediction. In particular, other TBT methods such as the Variance Orientation Transform (VOT) [30], FSA, and WhE methods, as well as the automatic calculation of certain radiographic parameters such as JSN, JSW, or JSA scores, could be investigated.