Automated evaluation of autoantibodies on human epithelial-2 cells as an approach to standardize cell-based immunofluorescence tests

Introduction Analysis of autoantibodies (AAB) by indirect immunofluorescence (IIF) is a basic tool for the serological diagnosis of systemic rheumatic disorders. Automation of autoantibody IIF reading including pattern recognition may improve intra- and inter-laboratory variability and meet the demand for cost-effective assessment of large numbers of samples. Comparing automated and visual interpretation, the usefulness for routine laboratory diagnostics was investigated. Methods Autoantibody detection by IIF on human epithelial-2 (HEp-2) cells was conducted in a total of 1222 consecutive sera of patients with suspected systemic rheumatic diseases from a university routine laboratory (n = 924) and a private referral laboratory (n = 298). IIF results from routine diagnostics were compared with a novel automated interpretation system. Results Both diagnostic procedures showed a very good agreement in detecting AAB (kappa = 0.828) and differentiating respective immunofluorescence patterns. Only 98 (8.0%) of 1222 sera demonstrated discrepant results in the differentiation of positive from negative samples. The contingency coefficients of chi-square statistics were 0.646 for the university laboratory cohort with an agreement of 93.0% and 0.695 for the private laboratory cohort with an agreement of 90.6%, P < 0.0001, respectively. Comparing immunofluorescence patterns, 111 (15.3%) sera yielded differing results. Conclusions Automated assessment of AAB by IIF on HEp-2 cells using an automated interpretation system is a reliable and robust method for positive/negative differentiation. Employing novel mathematical algorithms, automated interpretation provides reproducible detection of specific immunofluorescence patterns on HEp-2 cells. Automated interpretation can reduce drawbacks of IIF for AAB detection in routine diagnostics providing more reliable data for clinicians.


Introduction
Disease-specific autoantibodies (ABBs) are a serological phenomenon of systemic rheumatic conditions and autoimmune liver disorders. Despite the development of enzyme-linked immunosorbent immunoassay (ELISA) and multiplexing technologies for the detection of disease-specific AABs, the screening for anti-nuclear antibodies (ANAs) by indirect immunofluorescence (IIF) assays remains a standard method in the current diagnostic approach [1][2][3][4][5][6]. Several substrates have been proposed for ANA IIF assays; however, the screening for non-organ-specific AABs on human epithelial (HEp-2) cells is the most established method used [7][8][9][10][11]. In general, assessment of ANAs is followed by detection of specific AABs to, for example, extractable nuclear antigens (ENAs) and cytoplasmic antigens by immunoassays employing purified native or recombinant antigens. This two-stage approach comprises the following benefits: (a) highly sensitive screening of the most frequent and clinically relevant non-organ-specific AABs, (b) optimal combination with other assay techniques for the downstream differentiation of AAB reactivities based on the IIF pattern detected and the diagnosis suspected, (c) assessment of clinically relevant AABs without the need for further testing (for example, anti-centromere AABs), and (d) evaluation of AABs detectable only by IIF in case of unknown autoantigenic targets or non-available commercial assays [12][13][14]. Due to the key position of ANA screening in the serological diagnosis of systemic rheumatic diseases, consistent reproducibility and high quality of HEp-2 cell-based IIF assays are required [8,15,16]. However, the visual and therefore subjective evaluation of cell-based IIF assays complicates the standardized and reproducible evaluation of HEp-2 cell assays. Interpretation of immunofluorescence patterns is influenced by the knowledge and individual qualification of the investigator. Thus, a high intra-and interlaboratory variability is common and represents a major diagnostic problem, especially in non-specialized laboratories [17,18]. Automated reading of immunofluorescence patterns by automated interpretation systems with intelligent pattern recognition can overcome this issue [18,19]. In addition, automation of IIF pattern reading can provide a reliable basis for cost-effective serological diagnostics for laboratories with large sample numbers. In particular, the opportunity of modern electronic data management alleviates the heavy workload in such laboratories.
In this study, we compared the first automated interpretation system available for cell-based IIF with the currently established visual evaluation method in routine diagnostics of both a university and a private rheumatology referral laboratory. Visual findings of positive/negative discrimination and AAB pattern detection were compared with data automatically obtained by this system. Perspectives of automated interpretation of cellbased IIF tests will be discussed.

Materials and methods
Consecutive serum samples of 924 patients with a suspected diagnosis of systemic rheumatic diseases were referred to the routine laboratory at the Department of Rheumatology and Clinical Immunology of the Charité Universitätsmedizin Berlin. ANAs were determined using a HEp-2 cell-based assay. Samples with a titer of 1 in 320 or higher were scored as positive and subsequently tested for AABs against ENA. Samples with a titer of 1 in 80 or 1 in 160 were scored as weakly positive. Moreover, to assess the performance of the automated interpretation in a different setting, 288 consecutive serum samples were tested from a private referral laboratory. This laboratory receives mainly samples from general practitioners and small-and mediumsized hospitals to provide serological findings for the clarification of suspected rheumatic symptoms. Final diagnoses are usually not reported to the laboratory. The study was approved by the local ethics committee (EA1/001/06). Written informed consent was obtained from each patient.
Detection of anti-nuclear antibodies by HEp-2 cell assay ANAs in patient samples were assessed by commercial ANA assays in accordance with the instructions of the manufacturer (GA Generic Assays GmbH, Dahlewitz, Germany). Briefly, samples diluted in phosphate-buffered saline were incubated on HEp-2 cells fixed on glass slides in a moisture chamber for 30 minutes at room temperature (RT). The screening dilution was 1 in 160, except for individuals younger than 14 years old, in whom a screening dilution of 1 in 80 was applied. After washing, bound AABs were detected by incubation with fluorescein isothiocyanate-conjugated sheep anti-human immunoglobulin for 30 minutes at RT. Subsequently, slides were washed, embedded with 4',6-diamidino-2phenylindol (DAPI)-containing medium, and assessed either visually with a fluorescence microscope (Axiovert 40; Carl Zeiss, Jena, Germany, and Eurostar; Euroimmun AG, Lübeck, Germany) or automatically with the interpretation system (AKLIDES®; Medipan GmbH, Dahlewitz, Germany). Observers conducting the visual assessment were DR, M-GW, TB, RK, and BL.

Automated interpretation of HEp-2 cell assay data
The concept of the automated interpretation system AKLIDES® for evaluation of ANAs including pattern recognition is based on IIF using HEp-2 cells ( Figure 1) [18,19]. Briefly, IIF patterns of serum samples were assessed automatically on HEp-2 cells (GA Generic Assays GmbH) by using a motorized inverse microscope (Olympus IX81; Olympus Corporation, Tokyo, Japan) with a motorized scanning stage (Märzhäuser Wetzlar GmbH & Co. KG, Wetzlar, Germany), 400-nm and 490nm light-emitting diodes (CoolLED Ltd., Andover, UK), and a grey-scale camera (Kappa, Gleichen, Germany). The interpretation system is controlled by the specially designed software (AKLIDES®), which consists of modules for device and autofocus control, image analysis, and pattern recognition algorithms. The novel autofocus based on Haralick's image characterization of objects through grey-scale transition used DAPI as fluorescent dye for object recognition and focusing. To eliminate artifacts, an additional qualitative image analysis was performed by dividing the image into subobjects of equal size.
Object segmentation was conducted by histogrambased threshold algorithm followed by watershed transformation [20]. Segmented objects were characterized by regional, topological, and texture/surface descriptors.
More than 1,400 object-describing criteria were implemented. Mitotic cells were identified by DAPI staining. Classification was achieved through the combination of structure and texture characteristics by definition of rules for each pattern.
Immunofluorescence image data were evaluated according to the following hierarchy: (a) positivity, (b) localization of staining (nuclear, cytoplasmic, chromatin of mitotic cells), and (c) determination of nuclear patterns: homogeneous (homogeneous or speckled pattern with specific staining of the metaphase chromatin), speckled (fine, medium, or coarse speckled staining of interphase nuclei), nucleolar (homogeneous or speckled staining of nucleoli with weak nuclear staining or without nuclear staining), centromere (more than 30 nuclear dots in the interphase nucleus and metaphase chromatin), and multiple nuclear dots (multiple dots, fewer than 30 nuclear dots in the interphase nucleus).
A reactivity index (RI) was calculated by combining absolute image intensity, contrast, and number of greyscale levels of the total image for the assessment of image data. Since RI is influenced by exposure time, which depends on the highest image signal after exclusion of artifacts, even patterns with weak absolute signals like centrioles or nuclear dots can be detected. The determination of threshold values for the differentiation of positive signals was conducted on the basis of RI values of 200 normal blood donors.
With this software, the following six main patterns can be detected readily on HEp-2 cells: cytoplasmic, homogeneous, speckled, nucleolar, centromere, and multiple nuclear dots. Further stratification of nuclear Figure 1 Flowchart of automated human epithelial (HEp-2) cell assay interpretation by the automated reading system [18]. The fundamental analysis chain of the image processing by the automated system is divided into acquisition, quality control, segmenting, object description, and object classification. Segmented objects were described by boundary, regional, topological, and texture/surface descriptors. Digital features were combined to rules, analogous to rules defined by experts.
Egerer et al. Arthritis Research & Therapy 2010, 12:R40 http://arthritis-research.com/content/12/2/R40 or cytoplasmic patterns was performed by retrospective visual examination if required for discussion of differing results. Given an average workload of about 50 determinations a day, the system provides sufficient data storage capacity for 1 year.

Statistical analysis
Chi-square test was used to check the relationship between the two classification systems. To test for the strength of agreement, inter-rater agreement statistics was conducted [21]. McNemar test was performed to check the difference for paired proportions. P values of less than 0.05 were considered significant. Calculations were performed by using MedCalc statistical software (MedCalc, Mariakerke, Belgium).

Results
Comparison of positive and negative findings of patient sera referred to a routine university laboratory Consecutive sera of 924 patients with suspected systemic rheumatic disease were evaluated for the presence of ANAs in a routine university laboratory. ANAs were detected by HEp-2 cell assay and interpreted either visually by an experienced investigator or by automated reading and pattern recognition with the system. Samples were blinded for evaluation. Automated evaluation reported 546 sera (59.1%) as positive, 140 sera (15.1%) as weakly positive, and 238 sera (25.8%) as negative in regard to the presence of ANAs. Out of the 546 positively scored sera, 543 sera (99.5%) were confirmed by visual examination as positive or weakly positive ( Table 1). Two of the three discrepant sera demonstrated a cytoplasmic pattern, which was assessed as a negative ANA by visual examination (Figure 2). Cytoplasmic patterns of these samples were defined by retrospective examination. The third discrepant sample showed an artifact, leading to a positive finding by the automated system.
Out of 140 sera scored as weakly positive by the system, 113 sera (80.7%) were also interpreted as weakly positive by visual examination and one serum (1.0%) was interpreted as positive by visual examination. The 26 sera assessed as negative by visual examination (18.6%) demonstrated mainly weakly positive speckled staining of the nucleus in the automated system and this was scored as irrelevant by visual reading.
Out of 238 sera scored negative by the system, 199 sera (83.6%) were also assessed as negative by visual examination. In fact, none of these negative samples was evaluated as positive by visual examination. Only 39 sera (16.4%) were assessed as weakly positive, showing a titer of 1 in 80 with unspecific patterns by visual assessment. Thus, there was an agreement of 93.0% (859/924) regarding the discrimination of positive and negative samples by both approaches in this university laboratory. Chi-square statistics revealed a contingency coefficient of 0.646 (P < 0.0001). When weakly positive and definitely positive samples were combined into one group, the difference of 1.08% according to the McNemar test between both methods for positive/negative differentiation was not significant (95% confidence interval [CI] -0.77% to 2.84%; P = 0.25).
Comparison of pattern assessment of patient sera referred to a routine university laboratory There was a high agreement of 90.1% (492/546) comparing the visually and automatically defined fluorescence patterns of the samples reported positive by the automated system. The differing samples mainly demonstrated mixed patterns, which were assessed by visual expert examination and automated reading algorithms of the automated system with different emphasis of one or the other underlying pattern. Investigators emphasized the staining of nucleoli when assessing the combination of speckled and nucleolar patterns visually. In contrast, the mathematical software algorithms included the denser distribution of the speckled pattern with more value into decision making. A similar situation was found with the combination of nuclear and cytoplasmic patterns. When this mixed pattern was assessed, visual interpretation of experts tended to emphasize the nuclear staining (speckled, nucleolar). In contrast, the system algorithms emphasize the cytoplasmic staining in case of high-fluorescence signals.
Discrepant assessment of patterns was found with sera containing antibodies to nuclear membrane targets. These patterns were evaluated by the system algorithms as speckled. In contrast, the visual assessment clearly detected the increased speckled staining at the border of the nucleus (Figure 3). Sera containing antibodies to the Golgi complex were assessed as weakly speckled nuclear pattern by the software algorithms ( Figure 4). The pattern comparison of the 140 samples scored as weakly positive by the automated system demonstrated an agreement of 74.3% (104 sera). Discrepant samples again showed weak speckled nuclear staining. In summary, comparison of fluorescence pattern recognition of the 686 positive and weakly positive findings by the system with visual examination demonstrated an agreement in 596 sera (86.9%).
Comparison of positive and negative findings of patient sera referred to a private laboratory Furthermore, 298 consecutive sera referred to a private laboratory for the detection of ANAs were compared with ANA assessment by the system after routine visual evaluation by an expert (Table 2). Samples were blinded for evaluation. Automated interpretation with the system scored 57 sera (19.1%) of these 298 sera as positive, 16 (5.4%) as weakly positive, and 225 (75.5%) as negative. Of the 57 samples assessed as positive by the system, 55 sera (96.5%) were assessed as positive or weakly positive by visual evaluation.
Evaluation by automated interpretation scored 16 sera as weakly positive. Visual assessment determined 12 (75.0%) of these 16 sera to be positive with the same fluorescence pattern (100.0%). The four sera (25.0%) scored as negative demonstrated weak speckled fluorescence patterns in the system.  Out of 225 sera assessed as negative by the automated system, 201 sera (89.3%) were confirmed as negative by visual examination. The 24 discrepant sera (10.7%) that were scored as weakly positive with speckled or nucleolar patterns by the investigator and did not reach the threshold level in the automated system demonstrated no antibodies to ENA by other techniques. Thus, agreement in this patient cohort regarding the differentiation of positive and negative samples was 90.6% (270/298). Chi-square statistics revealed a contingency coefficient of 0.695 (P < 0.0001).
There was a significant difference of 6.04% (95% CI 2.30% to 8.50%; P = 0.0019) for both methods in this patient cohort by combining positive and weakly positive samples. When weakly positive results were excluded and positive and negative samples only were taken into account, the difference of 0.81% (95% CI -0.50% to 0.81%) was not significant.
In total, only 98 out of 1,222 sera (8.0%) demonstrated discrepant results regarding positive and negative differentiation by visual and automated interpretation ( Figure  5). When positive and weakly positive samples were combined into one group, the strength of agreement was very good (kappa = 0.828, 95% CI 0.795 to 0.860). For the assessment of one sample, the automated system required 60 seconds on average in a walk-away mode.
Comparison of pattern assessment of patient sera referred to a private laboratory Fifty-one of the 55 sera (92.7%) of sera scored positive by the automated system showed agreement in fluorescence pattern detection by visual and automated interpretation. Discrepant results were obtained when the AKLIDES® software algorithms assessed cytoplasmic fluorescence signals as nuclear staining due to the superposition of the nucleus by the cytoplasmic staining.

Discussion
The detection of AABs like ANAs by IIF was one of the first techniques available in routine laboratories for the serological diagnosis of systemic rheumatic diseases [22,23]. ANAs were even included in the classification criteria of systemic lupus erythematosus [24]. However, due to insufficient automation, poor standardization, and need of extensive expert experience in pattern recognition, automated ELISA and recently multiplexing Figure 4 Immunofluorescence pattern with staining of the Golgi complex, which was identified by AKLIDES® system as cytoplasmic speckled pattern. Green color: fluorescein isothiocyanate staining of autoantibody; blue color: 4',6-diamidino-2phenylindol staining of chromatin.  assays have frequently been used for ANA assessment [25,26]. Not only for ANAs, there is an ongoing debate whether these new techniques may replace immunofluorescence given that their limited sensitivity might be problematic for a screening method [27][28][29][30][31]. Until recently, reliable diagnostic tools for the automated interpretation of cell-based IIF tests like ANA detection on HEp-2 cells have not been available for routine laboratories. However, the use of digital images of HEp-2 cell-based assays for diagnostic aims [32,33] and the superiority of automated in contrast to subjective pattern classification have already been demonstrated [34]. Thus, the objective of this study was the comparison of the current visual subjective interpretation of HEp-2 cell-based assays for ANA detection with results obtained by the first automated interpretation system. Since the detection of ANAs is employed as serological screening for patients with suspected rheumatic disorders on the one hand and is part of classification criteria of systemic rheumatic diseases on the other, two patient groups tested with differing laboratory background regarding experience in ANA detection and prevalence of disease were included in the study. Consecutive sera of both a university laboratory specialized in rheumatic disease diagnostics and a private referral laboratory covering hospitals and out-patient departments were included in the study.
The basic precondition for the use of automated interpretation systems in routine diagnostics is the correct and reproducible differentiation of positive and negative samples. The comparison of visually and automatically obtained findings is hindered due to the lack of readily available standards with defined cutoffs for the definition of positive signals on HEp-2 cells in IIF assays. The Centers for Disease Control and Prevention (Atlanta, GA, USA) provide serum standards for specific patterns which are recommended to be employed for quality management. Laboratories providing ANA detection by HEp-2 cell assays frequently report different titers since cutoffs depend on technical equipment, expert knowledge, and patient population of the corresponding laboratory.
By means of the automated system, a very good agreement of 92.0% (kappa = 0.828) was obtained for the differentiation of positive and negative samples comparing automated interpretation with visual assessment by experienced examiners in different patient cohorts.
There was no significant difference for either interpretation method for the university patient cohort in differentiating positive from negative samples in our study. After exclusion of the weakly positive samples, the difference for both interpretation methods was also not significant for the patient cohort evaluated in the private referral laboratory. In such a cohort, a low prevalence of systemic rheumatic disease is usually expected. Samples with low ANA titers of 1 in 160 or less are not suggested to be subjected to further anti-ENA testing unless systemic rheumatic disease is strongly suspected [35]. In this context, automated interpretation of ANAs of this study is not significantly different from visual reading by experts regarding at least samples with ANA titers of more than 1 in 160.
The relatively high variability of routinely employed pattern recognition of ANA fluorescence images on HEp-2 cells is a challenge for the implementation of automated pattern recognition. Thus, different criteria exist, for example, for the description of coarse and fine speckled patterns [36]. Otherwise, a nucleolar pattern is usually defined by the positive staining of nucleoli but has to be specified by further staining of the chromatin region. The nucleolar staining can appear as homogeneous, clumpy, fine speckled, and speckled with mitotic dots and can be associated with AABs against PM-Scl complex, TH/To, fibrillarin, RNA polymerase I, and RNA helicase II. Anti-polymerase III or Ku AABs often demonstrate a fine speckled staining of the interphase chromatin additionally. Initiatives for the standardization of fluorescence patterns on HEp-2 cells for ANA detection have aimed at bridging the gap between routine diagnostics and science. Thus, five main patterns are recommended for the differentiation of nuclear staining patterns [17]. Elementary evaluation models for single patterns regarding the classification of pleomorphic patterns have already been developed [33].
The drawbacks of recently published approaches for automated pattern recognition appear to include an over-evaluation of final steps in image assessment like object extraction and classification [37][38][39][40]. In particular, self-learning classificators [39] have to be reviewed critically since local erroneous self-learning cannot lead to improvement of interlaboratory variability. Frequently, highly qualitative images are preselected, paving the way for human bias of subsequent findings.
In our study, agreement of pattern recognition between automated and visual assessment was 85.0%. This congruence reached 90.0% when only positive samples were taken into account. Weakly positive samples detected by visual examination demonstrated titers below 1 in 160. The latter finding confirms data of a recently published study [35].
The high agreement of our study between automated and visual interpretation of AABs results supports recent data showing that the success of automated interpretation systems depends essentially on the first processing steps like qualitative image acquisition and quality control of object identification [18,38].
The system used in the present study with novel pattern recognition algorithms for the automated assessment of HEp-2 cell assays may be employed for efficient AAB screening, especially in laboratories with high numbers of determination due to cost-effective management of data and human resources. The system can be readily implemented into routine diagnostics with reasonable demand of operator training. Findings provided by the system should be approved by an expert with experience in routine ANA reading due to the difficulty in assessing sera with differing AABs resulting in mixed patterns. Titer prediction enabled by the standardization of the fluorescence signal can further improve costefficiency [19,41].

Conclusions
The standardized evaluation of HEp-2 cell assays by automated interpretation systems can pave the way for reproducible and comparable results in and between laboratories. Archiving of digitized image data improves data management and provides the basis for efficient exchange of data. Automated interpretation systems for cell-based IIF assays can minimize the drawbacks regarding other automated techniques and strengthen the role of immunofluorescence for serological screening of autoimmune diseases.