Ultrasound in the evaluation of enthesitis: status and perspectives

Introduction An increasing number of studies have applied ultrasound to the evaluation of entheses in spondyloarthritis patients. However, no clear agreement exists on the definition of enthesitis, on the number and choice of entheses to examine and on ultrasound technique, which may all affect the results of the examination. The objectives of this study were to first determine the level of homogeneity in the ultrasound definitions for the principal lesions of enthesitis in the published literature and second, to evaluate the metric properties of ultrasound for detecting enthesitis according to the OMERACT filter. Methods Search was performed in PUBMED and EMBASE. Both grey-scale and Doppler definitions of enthesitis, including describing features of enthesitis, were collected and metrological qualities of studies were assessed. Results After selection, 48 articles were analyzed. The definition of ultrasound enthesitis and elementary features varied among authors. Grey-scale enthesitis was characterized by increasing thickness (94% of studies), hypoechogenicity (83%), enthesophytes (69%), erosions (67%), calcifications (52%), associated bursitis (46%) and cortical irregularities (29%). Only 46% of studies reported the use of Doppler. High discrepancies were observed on frequency, type of probe and Doppler mode used. Face and content validity were the most frequently evaluated criteria (43%) followed by reliability (29%) and responsiveness (19%). Conclusions Ultrasound has evidence to support face, content validity and reliability for the evaluation of enthesitis, though there is a lack of well-reported methodology in most of the studies. Consensus on elementary lesions and standardization of exam is needed to determine the ultrasound definition of enthesitis in grey-scale and in Doppler for future applications.


Introduction
Enthesitis, that is, the inflammation of insertions of tendons, ligaments and capsules into the bone, is the characteristic sign of ankylosing spondylitis and related pathologies, which are commonly regrouped as spondyloarthritis (SPA). The functioning enthesis dissipates stress over a wide area, including the insertion, immediately adjacent tendon and adjacent bone. The soft tissue components of an enthesis have traditionally been evaluated by clinical examination based on the presence of tenderness and/or swelling while X-rays have been used to assess associated bony changes. The accuracy of these methods, however, is uncertain, which is why new imaging techniques such as ultrasound and magnetic resonance imaging (MRI) have been sought. The role of MRI for assessing the spectrum of pathology in SPA has recently been reported [1,2]. This technique has been most commonly used to assess axial disease. The MRI pattern of SPA enthesitis has been described as a diffuse bone edema adjacent to enthesis, associated with surrounding soft tissue edema [3]. However, MRI lacks sensitivity and specificity for peripheral enthesitis [4]. This can be explained because changes in the fibrous part of the enthesis, where fibroblasts are tightly cross-linked with little scope for accumulation of water, cannot easily be detected with MRI [4,5]. Additionally, MRI cannot easily assess multiple sites or be used to assess the contralateral joints.
Most of the available data on the potential application of ultrasound for rheumatology is currently about the assessment of its role in rheumatoid arthritis with limited data or studies in other rheumatic diseases, among which SPA is themost frequently studied . For routine use in daily practice and clinical trials, the assessment of ultrasound performance in terms of metric qualities is recommended [54]. Though several studies have highlighted the value of ultrasound in assessing inflammation of enthesis in SPA, there is no clear agreement on which structures to examine. Even though a clear distinction between the meaning of the word enthesitis and enthesopathy exists in the rheumatologic literature, no clear definition of an enthesitis lesion has been reported in the ultrasound literature. Thus, technical and anatomical issues, combined with a lack of standardization, may have hampered the development and validation of the ultrasound technique applied to clinical practice, or to multicenter studies, in SPA. Consensus definitions for ultrasound-related pathologies were published by the OMERACT (Outcome Measure in Rheumatology in Clinical Trials) ultrasound group in 2005, including enthesopathy [52]. However, no data are available about the implementation of this definition in clinical and research practice.
The objective of this study was to first determine the level of homogeneity in the ultrasound definitions for the principal lesions of enthesitis in the published literature, and second, to evaluate the metric properties of ultrasound for the detection of enthesitis according to the OMERACT filter through a systematic literature review. We focused our review on the anatomical definition of enthesitis, that is, attachment of ligaments or tendons or capsules on bones, which does not imply body tendon nor surrounding tissue, such as bursae.

Search strategy and study selection
The search for original articles concerning humans, published in the English language between January 1985 and May 2010, and referring to peripheral enthesitis and ultrasonography was carried out in PUBMED and EMBASE databases. Reviews or abstracts from scientific congresses were not included.
In order to obtain the largest number of references, the search was performed in two steps in PUBMED with different key words: -Search 1 was carried out using the following key words « ankylosing spondylitis OR spondylarthropathies OR reactive arthritis OR psoriatic arthritis OR enthesis OR enthesopathy OR rheumatic diseases OR definition » AND « ultrasonography OR ultrasound OR sonography OR Doppler ».
-Search 2 was performed including the key words «entheses OR enthesis OR enthesitis OR enthesopathy ». For both searches key words referred to Mesh Terms or, if not available, to key words present in the title/ abstract.
In EMBASE the search was performed with the key words « ankylosing spondylitis OR spondylarthropathy OR reactive arthritis OR Psoriatic arthritis OR Enthesis OR Enthesitis OR Enthesopathy OR Definition » AND « Ultrasonography OR Ultrasound OR Sonography OR Doppler ».
Only references with available abstracts were assessed. Titles, abstracts and full reports of articles identified were systematically screened by one author (FG) with regard to inclusion and exclusion criteria. The final search was verified by a second author (FJ). Articles concerning cadavers were not included in the final selection if they concerned healthy subjects.
Articles which did not meet inclusion criteria were excluded at any step of the study selection.

Data extraction
All data were extracted from the selected articles using a standardized spreadsheet previously developed and validated for systematic reviews [55,56] . All selected articles were rated in order to determine ultrasound definitions of enthesitis or its characteristics and to evaluate the quality of the studies according to the OMER-ACT filter [54]. A standardized tool for assessing the quality of the analyzed studies was developed and assessed in a binary mode (yes/no) based on a set of six predefined criteria: 1) Was the recruitment of patients well-defined in the methods section? 2) Was the definition of ultrasound enthesitis clearly defined as well as the definition of each elementary component? 3) Was there a description of ultrasound scanning technique? 4) Was there a description of attempted blinding of observers? 5) Was there a description of enthesitis scoring, and which source was this scoring based on? 6) Was the choice of comparator adequately explained and results completely given? Quality was reported on a scale of 0 to 6, with higher results indicating higher quality.
Particular attention was also given to the definition, quantification and site of detection of Doppler signals, (that is, vascularization detected at enthesis, in the body of the tendon, at cortical bony insertion, in the bursa).

Evaluation methods
Face and content validity, construct validity, criterion validity and discriminant validity (that is, reliability and responsiveness) were independently evaluated in every paper, including whether the methods for assessing it and their measurement were available or not. Face and content validities, essentially subjective, were analyzed according to the conclusions of authors. Criterion validity was considered achieved when ultrasound results were concurrently or predictively compared with a true "gold standard".
Construct validity was achieved when ultrasound evaluation of enthesitis was demonstrated to be consistent with theoretic concepts (that is, that ultrasound measure of enthesitis is related to other measures of enthesitis).
The evaluation of reliability was divided into two parts: the acquisition phase and reading of images phase. For both we assessed the intra-and inter-observer evaluation. Responsiveness was evaluated by the ability of the tool to demonstrate change, usually in response to an intervention.

Statistical analysis
Descriptive statistics were used to report data. Frequencies and percentages were used for categorical variables. Figure 1 illustrates the flow chart of the selection of the articles. Of the 3,852 references obtained from databases, 237 abstracts were selected after reading titles, 94 articles were selected after reading abstracts and, finally, 48 articles were analyzed to determine the ultrasonographic enthesitis definition and characteristics. These articles included 22 case-control studies, 5 case-report studies, 17 case-series studies, 2 cohorts, 1 expert consensus and 1 randomized control trial ( Table 1). Most of them (n = 37) focused on inflammatory pathologies: spondylarthropathy or ankylosing spondylitis (n = 24), spondylartropathy or other inflammatory rheumatism (n = 3), and psoriatic arthritis (n = 10). Only six studies focused on degenerative involvement of enthesis. Two studies did not report the patients' diagnoses.

Results
Entheses of lower limbs were the most common studied, especially Achilles tendon (80% of articles) followed by the entheses of upper limbs. No consensus concerning either the location or the number of enthesis to be examined was observed.

Ultrasound parameters and setting
The description of ultrasound examination was reported in 35 (73%) studies and recommendations on the position of the examined enthesis, especially for lower limbs, were available in most of the studies. Authors predominantly used 90°flexion of the feet during examination of Achilles tendon and Plantar Fascia, 30°to 60°flexion of the knee during examination of the patella ligament and the quadriceps tendon. In more recent studies, a neutral position of the feet was used to perform Achilles tendon entheses examination.

Definition and description of enthesitis in grey-scale and Doppler modes
In grey-scale a 7.5 MHz or 7.5 to 10 MHz linear probe frequency were used in 15/48 studies while a frequency >10 MHz was used in 23 studies. Information concerning probe characteristics was lacking in four studies. Table 2 shows definitions or description of ultrasound enthesitis and ultrasound elementary components used for defining enthesitis (for further details see also Table  S1 in Additional file 1). Table S2 in Additional file 2 shows ultrasound parameters and equipment used in the different studies. In grey-scale, enthesitis was characterized by the presence of increasing thickness in 45 (94%) studies, hypoechogenicity of the enthesis in 40 (83%) studies, enthesophyte in 33 (69%) studies, erosion in 32 (67%) studies, calcification in 25 (52%) studies, associated with bursitis in 22 (46%) studies or cortical irregularities in 14 (29%) studies. Only 16 (33%) studies described the ultrasound technique of thickness measurement, which was prevalently measured at the point of maximal thickness on the bony insertion (for further details see also Table S3 in Additional file 3).
Only 22 out of 48 (46%) studies described the use of Power Doppler to assess enthesitis ( Table 3); all of them were published after 2003. Most of the studies took into account the presence of signal Doppler in different locations: tendon, enthesis and bursa. The exact site of measurement of a Doppler signal was described in 12 studies. There were discrepancies regarding the technical recommendations of the use of Doppler with a huge difference of the pulse repetition frequency (PRF) in the studies ranging from 400 Hz to 1,000 Hz.
Scoring system of enthesitis (grey-scale and Doppler) Table 4 shows the different ultrasound scoring systems used for evaluating enthesitis. Ultrasound scoring of enthesitis was performed in 20 studies. All of the proposed scoring systems were primarily based on grey scale changes, measuring the thickness of tendon insertion, the presence of erosions, bursitis and enthesophytes. Proposed grading was semi-quantitative in most of them. Only nine studies reported scoring systems of Power Doppler activity of the enthesis, which were generally semi-quantitative [7,8,13,15,20,22,23,37,45], but also quantitative with a proposed cut-off for differentiating between SPA and controls. Five scoring systems were developed at the enthesis level (and mostly concerned Achilles enthesis evaluation), and 15 were developed at the patient level (that is, the scoring system gave information regarding different enthesis sites and allowed the evaluation of global patient inflammatory activity or enthesis structural damage). Two of them, the GUESS (Glasgow Ultrasound Enthesitis Scoring System) score, proposed by Balint et al. in 2002 [16] and the SEI (Spanish Enthesitis Index) score, by Alcade et al. [14], take into account grey-scale elementary components alone. Both of them are scoring systems developed at the enthesis level and at patient level, and the GUESS was the scoring method most frequently used (7/20).
Published scoring systems were used both for diagnostic purposes [22,23,53], and for sensitivity to change [15,19,31]. Performance of those scores varied according to the purpose.
Evaluation of studies according to the OMERACT filter Table 5 summarizes the characteristics of the 48 selected articles according to the OMERACT filter.

Truth
The face, content, criterion and construct validity of ultrasound findings of the enthesis has been tested in only 21 articles (44%). Comparators were clinical examination in 13 studies, MRI in 5 studies, X-ray in 5 studies and histology in 1 study. In three studies, two comparators were used, clinical and X-ray or MRI.
Ultrasound examination was performed blindly from other data in 29 articles (62%).

Discrimination Reliability
Detailed results of the reliability of the technique, which were evaluated in 14 (29%) studies are only reported in the additional online file (Table S4 in Additional file 4). Among them, eight studies correctly reported the methodology used. Reliability was most frequently tested on static images reading and only two evaluated the acquisition. Only four studies included information on both inter-examiner and intra-examiner reliability. In general, reading reliability was good but acquisition reliability had some deficiencies.

Responsiveness
Responsiveness was evaluated in nine studies. Of them, only four included power Doppler evaluation of the enthesis [15,17,21,49] and three used a scoring system [15,19,31]. Ultrasound evaluation of enthesitis was found to be sensitive to change in six studies, whereas three studies did not demonstrate responsiveness, but the evaluation concerned the Grey-scale aspect alone, while in the studies also including Power Doppler the sensitivity to change was greater. Only three articles reported responsiveness regardless of statistical analyses, while six articles were descriptive of changes but did not quantify it.

Feasibility
None of the analyzed papers reported information about feasibility of examining entheses using ultrasound.

Discussion
The present review has demonstrated that ultrasound is considered a valuable tool for assessing enthesitis. Since 1985, when the first description was made by Lehtinen and colleagues, an increasing interest for using this       Calcifications were scored on a semiquantitative score of 0 to 3 Doppler and erosions were scored as 0 or 3 points Scores for tendon structure, tendon thickness and bursa were either 0 or 1. Calcifications were examined at the area of the enthesis insertion, and scored as 0 if absent, or 1 if a small calcification or ossification with an irregularity of enthesis cortical bone profile was seen. Calcifications were given a score of 2 if there was clear presence of enthesophytes or if medium sized calcifications or ossification were observed. Lastly, they were classified as a 3 if large calcifications or ossifications were present. To simplify things, ossifications and enthesophytes at the enthesis were also included as calcifications. A Y Y soft tissue inflammation (seven items): tendon hypoechogenicity, Entheseal hypoechogenicity, Bursal effusion, PDS signal at tendon level, PDS signal at entheseal level, PDS signal at bursal level tissue damage (five items): Intratendineous calcifications, Entheseal calcifications, Enthesophytes, Bone erosions, Bone irregularities* (not used to calculate total score) (1) a total score for soft tissue inflammation, which resulted from the sum of the scores assigned to the 7 US findings indicative of soft tissue inflammation, ranging from 0 to 7 with presence/absence data and from 0 to 14 with semiquantitative scores; (2) a total score for tissue damage, which resulted from the sum of the scores assigned to the 4 US findings indicative of tissue damage, ranging from 0 to 4 with presence/ absence data and from 0 to 8 with semiquantitative scores.

Y N A
2009 Iagnocco [8] A (neutral position) Y Y All lesions scored on both a dichotomous scale (present/absent) and a 4-point semiquantitative scale (0 = absent, 1 = mild, 2 = moderate, 3 = severe) enthesopathy: tendon hypoechogenicity at the level of bony attachment, tendon thickening at the at the level of bony attachment, intratendinous calcifications, enthesophytes, bony erosions, bony cortex irregularities, presence of Doppler signal at the level of bony attachment, presence of intratendinous Doppler signal bursitis: enlargement of deep calcaneal bursa, enlargement of superficial calcaneal bursa tendon lesion: both partial and full-thickness tendon lesions technique in the evaluation of SpA enthesitis has been observed, especially within the last 10 years. This is probably due to the tremendous technological progression of ultrasound equipment. However, standardization of enthesitis assessment by ultrasound would facilitate the dissemination of this technique in daily practice, and also allow adequately trained sonographers to participate in multicenter research studies. A wide variability was observed among studies in the definition of ultrasound enthesitis, associated with a broad heterogeneity of definitions of its elementary components, and the absence of a consensus on technical parameters and methods of examination probably led to the observed heterogeneity in metric properties of the studies according to the OMERACT filter. No consensus concerning either the location or the number of enthesitis to be examined was observed.
Those discrepancies can be explained by the inclusion of studies from 1985 until the present, assuming that ultrasound equipment has improved considerably since that time, and the differences in the quality of equipment may have hampered the detection of those lesions. However, the quality and the attention in the description of enthesitis features have improved in the studies published after 2005, which may be explained by the publication from our group on the preliminary OMER-ACT definition of enthesopathy [52]. Indeed, previous studies have shown that grey-scale elementary lesions may be observed in both mechanical and inflammatory enthesopathy [11,30]. Yet, in order to help diagnosis, a more specific feature is the detection of inflammatory signs, especially the vascularization.
Since the first observation on the utility of power Doppler for visualizing vascularization of the enthesis as a sign of inflammation made in 2003 [22], an increasing number of studies have included Doppler evaluation. Some authors have well demonstrated the presence of vascularization of the enthesis/bone junction in SPA patients [13,20,23,37]. Even if Doppler use seems to be important, a wide heterogeneity in its use was recorded. Most of the studies referred to the presence of Doppler signal in different locations: tendon, enthesis, bursa. The lack of consensus with regards to the site of examination of abnormal vascularization may contribute to explaining discrepancies among studies. Some authors may call "inflammatory enthesitis" what would be called "tendonitis" by others. Moreover, this review has shown a large difference in the Doppler parameters used among studies. Doppler sensitivity to inflammatory flow (low-velocity flow) depends partly on the settings and partly on the type of equipment.
The differences found in the articles may, therefore, be explained by the lack of consensus on the optimal Doppler settings for enthesitis. Since no information concerning inter-equipment reliability for enthesitis evaluation is available, the different types of ultrasound equipment used may also explain part of the discrepancies observed. Indeed, Doppler sensitivity could have been affected by the type of equipment used; better sensitivity may have been reported with new generation equipment with the highest quality of Doppler parameters.
Only 73% of the studies clearly described acquisition technique. For example, the method for measuring enthesis thickness, which appears as one of the most important features recorded by authors for characterizing enthesitis of the Achilles tendon, was only described in 31% of the studies despite the fact that the necessity of measuring the thickness for defining the presence of enthesitis was reported by 94% of the authors. Measurement methods and site of measurement varied consistently and none of the proposed methods have been extensively tested and validated yet.
The quantification of enthesitis by ultrasound was predominantly performed by using semi-quantitative scoring methods. However, some differences were observed in the evaluation of involvement as all of the proposed scoring systems combined both evaluation of inflammatory activity, mostly by taking into account echogenicity and increased thickness and structural damage, mostly enthesophytes and erosions. As these are all grey-scale changes, this could explain the discrepancy observed in the sensitivity to change. In recent years, there has been more focus on enthesitis vascularization, probably the most interesting and specific feature to differentiate inflammatory enthesitis from mechanical enthesitis [22]. Consequently, enthesitis scoring systems taking Doppler signal into account have been proposed. These scoring systems, taking more into account the inflammatory activity may better present sensitivity to change. Hatemi et al. proposed to add a semi-quantitative scoring concerning vascularization to the GUESS score [7].
The proposal of a scoring system validated at the patient level, taking into account inflammatory activity and structural damage is one of the challenges for future studies regardless of ultrasound enthesitis. This Y, yes; N, no; U, unclear; NA, not available A, Achilles tendon; ASIS, anterior superior iliac spines; CET, commun extensor tendon insertion on lateral epicondyle; CFT, common flexor tendon insertion on medial epicondyle; PF, Plantar Fascia; PSIS, posterior superior iliac spines; PT, patellar tendon; PTDI, patellar tendon distal insertion; PTPI, patellar tendon proximal insertion; Q, quadriceps * MASES: 1 st and 7 th costosternal joints, PSIS, ASIS, iliac crests, A, 5 th lumbar spinous process, ** GUESS score: A, PF (90°),PTPI, PTDI, Q (30°) implicates to determine which enthesis are the most relevant to include in the scoring system. Moreover, different scoring systems probably would have to be proposed and validated for diagnostic purposes and for monitoring treatment.
Are the analyzed studies correctly designed for applying one or all parameters of validity of the OMERACT filter?
Concerning face validity, most of the authors agreed on the ability of ultrasound to detect enthesitis and related abnormalities. Thus, ultrasound measures of enthesis involvement (both inflammation and structural damage) must be considered to have face and content validity according to the filter. Concerning construct and criterion aspects, validity results are mitigated, probably because of the lack of a good comparator (or reference standard) for evaluating ultrasound enthesitis. In fact, we cannot consider any other imaging techniques, such as X-rays, MRI or clinical evaluation as a true gold standard because they do not measure the same phenomenon. X-rays can only detect structural damage and do not give information concerning soft tissue evaluation, and, therefore, do not give information on inflammatory activity as ultrasounds do. Clinical evaluation underestimates enthesitis involvement due to the difficulty to clearly appreciate the enthesis by physical examination; and a conventional MRI, due to technical limitations, is unable to visualize isolated enthesitis [57]. MRI findings, particularly the measures suggestive of inflammatory activity, need further comparison with ultrasound to evaluate the differences in the imaging techniques and to determine which are the common areas of involvement in order to help further clarification of construct validity. The only real reference which can correctly evaluate ultrasound capabilities is histology, which cannot be currently used because of ethical reasons.
Concerning the discrimination aspect of the filter, published studies have demonstrated that ultrasound can be a reliable and sensitive tool, even if some of the aspects of reliability need to be improved. This applies to the detection of grey-scale abnormalities which were less reliable than the detection of a Doppler signal in the two studies evaluating both the reading and acquisition phases.
Responsiveness was not always evaluated and frequently only a merely description of changes was reported. Among the nine studies in which sensitivity to change was reported, responsiveness was not demonstrated in three which used grey-scale evaluation alone, while all the studies including Doppler evaluation showed responsiveness. Doppler evaluation appeared to be an important feature to take into account in order to evaluate responsiveness to treatment and it should be included in enthesis examination for this purpose. Further evaluation of the responsiveness of enthesitis evaluation should be performed on scoring systems with evidence of statistical difference.

Conclusion
In conclusion, ultrasound enthesitis may be useful for diagnosis or monitoring of SPA patients, but has still to be validated. It appears as a valid (especially for face and content validity) and reliable tool for enthesitis evaluation. A consensus on enthesitis definition is required in order to improve the quality of studies and to improve the value of ultrasound in SPA management. This article is part of the series Advances in the imaging of rheumatic diseases, edited by Mikkel Ostergaard. Other articles in this series can be found at http://arthritisresearch.com/series/imaging