A New Program for Highly Reproducible Automatic Evaluation of the Substantia Nigra from Transcranial Sonographic Images

Aims. Recent studies report increased echogenicity of the substantia nigra (SN) in patients with Parkinson's disease (PD) using transcranial sonography (TCS). However, the main limitation to TCS is its dependence on the sonographer's experience. Experimental software for quantitative evaluation of the echogenic SN area was thus developed by us. The aim of this study was to test the reliability of the data using developed B-Mode Assist software in patients with parkinsonism and in healthy volunteers. Methods. The SN was imaged from the right temporal bone window in mesencephalic plane using TCS. DICOM images of SN were saved, converted into JPEG format, encoded and processed. Two observers performed 3 automatic evaluations of the SN area (measurements of SN area in each gray scale intensity inside the region of interest) by counting the standard deviation of all 6 measurements using developed software. The average value of all 3 measurements of each observer was used for computing Cohen's kappa coefficient to determine inter-observer correlations. Cohen's kappa coefficients as an intra-observer correlation for observer 1 and observer 2 were counted from the first 2 measurements of both observers.

Becker et al. 7 demonstrated for the first time that a hyperechogenic and enlarged SN can be detected by TCS in PD patients.Recent studies reported increased SN echogenicity in 91-100% of PD patients, with boundary enlargements between 0.20-0.25 cm 2 depending on the specific ultrasound system used [8][9][10][11] .In contrast, a hy-perechogenic enlarged SN is detectable in only 8-14% of healthy volunteers [8][9][10][11] .Some apparently healthy volunteers with positive TCS finding will later developed symptoms of PD (ref. 8), For this reason, TCS can complement other imaging methods for early diagnosis.
Dependence of image quality on both the sonographer's experience and the quality of the patient's bone window are the main limitation of TCS in the evaluation of SN hyperechogenicity [12][13][14] .Several automatic computer programs have been developed to circumvent this limitation.
We developed an experimental application B-Mode Assist with a graphical user interface (GUI) in MATLAB, an integrated development environment with a plug-in Image Processing Toolbox, for morphometric analysis of the SN (ref. 15,16).MATLAB is compatible with Digital Imaging and Communications in Medicine (DICOM) files, the worldwide standard for imaging, storing, and transferring data from various imaging modalities.DICOM images are raw, unfiltered, and free from noise inference.Furthermore, each DICOM file contains information about acquisition settings and other relevant technical information.We developed an application for measuring the area inside a region of interest (ROI) in sonographic images of the midbrain (containing the SN) to aid in the diagnosis of PD or to follow disease progression.The application was developed as a GUI-based standalone application for Microsoft Windows XP or newer, such as Windows Vista or Windows 7.
The aim of this study was to assess the utility of this experimental application using statistical tests of reproducibility.For this purpose, we determined (1) the minimum and maximum echogenicity curve of the SN in each image, (2) the 95 th percentile values from healthy subjects with normal SN (area ≤ 0.25 cm 2 ), (3) the standard deviations of all individual means of measurement series performed by two independent software users, and (4) the inter-observer and intra-observer correlations between image measurements represented by Cohen's kappa coefficients.Finally, we generated a receiver operating characteristic (ROC) curve to describe the sensitivity and specificity of these results for distinguishing healthy subjects from patients with parkinsonian syndromes.

Patients and Imaging
One-hundred selected patients were examined in the neurosonology laboratory over a 3-month period.Demographic data of patients are presented in Table 1.Over half (52) of the patients had confirmed PD and 19 had other parkinsonian syndromes.The SN was imaged from the right and left temporal bone windows in the mesencephalic plane.A butterfly-shaped structure of mesencephalic brainstem and region of the SN were depicted as clearly as possible from the transversal plane.DICOM images of the ipsilateral SN imaged from right temporal bone window were saved and encoded.In each patient, five SN images were obtained, out of which two images with the best quality were used for further processing.All selected images were subsequently converted into Joint Photographic Experts Group (JPEG) format for processing.This format is one of loseless format which is commonly used in image processing.
A phased P4-2 array (Philips HDI 5000, Bothel, WA, USA) was used for TCS.The examination was performed through a temporal bone window with a penetration depth of 15 cm, dynamic range of 50 dB, frequency range of 2-4 MHz, tissue index (TI) of 1.9, and mechanical index (MI) of 1.3.A butterfly-shaped structure of mesencephalic brainstem and the SN were depicted as clearly as possible from the transverse plane (Fig. 1).Personal data and examination times were deleted and all images acquired were encoded with a unique key.
All images were visually evaluated by an experienced sonographer (DŠ).Images with an echogenic SN area ≤ 0.25 cm 2 were rated as normal (Group A) and images with an echogenic SN area > 0.25 cm 2 were judged as pathological (Group B).
During the initial phase of evaluation, the input image was in 8-bit or 24-bit color (RGB).For all subsequent processing steps, images were converted to 8-bit gray scale (intensity value I = 0 to 255).If the input image was 24-bit color (RGB), the rgb2gray function was used for grayscale conversion:  Hence, each pixel is represented by one intensity value I.
The designed algorithm allows ROI-based processing on grayscale images with intensities 0 to 255, binary thresholding, and computation of areas inside an elliptical ROI to detect potential defects in the SN.

Pre-processing
The application worked with images converted into common bitmap formats such as BMP, JPEG, PNG, and JPEG 2000, but only JPEG images were used in this study.Input images were loaded into the application and cropped to an ROI-1 window of 50×50 mm from the native axis by the image.This size of ROI-1 windows was 180 x 180 pixels from original images with 768 x 576 pixels which were used.The first step was the conversion of the raw cropped imaged into an 8-bit intensity image as described.Fig. 1 shows an example of a window with an ROI-1 containing the SN.

Main Processing
All grey scale images were then used for binary thresholding.The algorithm computed the area for each I (from 0 to 255) inside the total area A = 50 mm 2 of the elliptical ROI-2.This ROI-2 has a longer axis in 60 degrees inclination to encompass the right SN. ROI-2 was automatically positioned by preset coordinates (2; 2) and manually fixed in the correct position within SN in the case, when ROI-2 was automatically positioned outside SN.An example of an elliptical ROI-2 within the ROI-1 window is shown in Fig. 2. Binary masks were used to define ROI-2.

Computation of SN area
The principle is based on the calculation of the curve delineating the dependency of SN area on threshold of each grey scale intensity (0-255) inside the 50 mm 2 ROI-2.Afterwards, area under the curve representing the sum of all measured 256 areas was assessed (Fig. 3).The area inside the ROI-2 for each intensity I = 0 to 255 formed the basic data set for statistical analysis.Binary thresholding followed the simple algorithm: where T is the user-defined threshold (2)   An example of this graph is shown in the Fig. 4.

Measurements
All sonographic images were evaluated by two independent observers.Both observers were blinded to patient diagnosis.Both observers repeated the measurements 3 times over a 1-week interval for evaluation of intra-observer reproducibility.Results from both observers were compared to evaluate inter-observer reproducibility.Comparison of two SN images with the best quality in each patient was also performed by one observer.

Ethics Committee approval
The Ethics Committee of Ostrava University Hospital (Ostrava, Czech Republic) approved the study protocol.Each volunteer signed an informed consent form.The en-Fig.3.An example of automatic detection of region of interest -1 (ROI-1) including the mecencephalon and substantia nigra (window 50×50 mm) with manual placing of the elliptical ROI-2 of 50 mm2 within ROI-1.The automatic measurement of substantia nigra echogenicity in a healthy volunteer.From the sums of these differences, we determined four classes for ROC: • if the sum of differences > 220 -the value was definitely physiological • if the sum of differences was between 220 and 0 -the value was probably physiological • if the sum of differences was between 0 and -220 -the value was probably pathological • if sum of differences < -220 -the value was definitely pathological The results were compared with the results of visual assessment performed by an experienced sonographer as dichotomized (presence/absence of SN pathology).We then computed the number of patients in each category on the ROC curve.
All data were analyzed using SPSS version 15 statistical software (SPSS Inc., Chicago, IL, USA).

RESULTS
Out of 100 images in JPEG format used for analysis, 8 images were eliminated from further processing due to a high level of noise cause by an insufficient bone window.Fig. 5 shows the minimum area value, the maximum area value, and the 95 th percentile of Group A images calculated from average values of all 6 measurements.From analysis of variance, the deviation from mean values for experimental measure was relatively low.The mean of the standard deviation (STD) was 3.87, which corresponds to a deviation of 3.87 mm 2 inside ROI-2 (Fig. 6).The coefficient of variability v(x) was 38.5%.
The intra-observer kappa coefficient for observer 1 was 0.947 and that for observer 2 was 0.943.The interobserver kappa coefficient was 0.880.When comparing two SN images with the best quality in each patient performed by one observer, kappa coefficient was 0.845 and the coefficient of variability v(x) was 32.0%.
Thus, the observers demonstrated a high level of

Statistics
The mean values of the 3 measurements performed by each observer on each image (patient) over one week were calculated, along with the individual variances and standard deviations.Subsequently, a coefficient of variability was calculated from the results of all 6 measurements (three each by two observers).
The average values of these 6 measurements from patients with an SN area ≤ 0.25 cm 2 were used to compute 95 th percentiles for subjects with "normal" SN parameters.The 95 th percentile was then used as a threshold value to obtain a ROC curve.
The average values obtained by the first observer were compared to those obtained by the second observer to compute Cohen's kappa coefficient for inter-observer correlation.For determination of the intra-observer coefficient, Cohen's kappa coefficients were calculated from the mean values and variances obtained by each observer from the three separate measures (per image/patient) conducted over one week.Cohen's kappa coefficient was calculated also when comparing two SN images with the best quality by one observer.
The differences between the 95 th percentile values calculated from Group A and the average values of all 6 measurements (from both observers) were computed.mutual agreement and each obtained highly consistent morphometric measurements over three successive repetitions.Correlations between visual evaluation and evaluations using developed application are in Table 2.The agreement in the detection of presence of SN pathology between both methods was in 97.8% of evaluated images.The automatic measurement reached 100% sensitivity, 96.2% specificity, 95.1% positive predictive value and 100% negative predictive value when the visual evaluation was used as a gold standard.Results from the ROC curve of the computed differences between all mean areas and the 95 th percentile for all non-pathological images compared favorably with the visual evaluation (Fig. 7).

DISCUSSION
The B-Mode Assist program presented here was developed as a GUI-based standalone application for Microsoft Windows XP (or newer operating system) to determine  the size and echogenicity of the substantia nigra in sonographic images.End users without MATLAB require only MATLAB Compiler Runtime (MCR) to run this application.MATLAB was chosen as the developmental environment because it is an affordable tool that can solve a wide variety of problems in medical image processing.The program was developed as an M-file and compiled into an executable EXE application.
Several applications are available for the semi-automated segmentation of medical images and for other forms of processing 17 .However, to the best of our knowledge, this is the first study testing a specific computer application for the evaluation of TCS brain tissue morphometry.Moreover, this is the first application using threshold intensity (echogenicity, I = 0 to 255) to determine SN size in cross-section.The final result is presented as a graph with SN area in each intensity.
Repeated measures by two independent observers (program users) demonstrated the high reproducibility of this application.Indeed, the kappa coefficients were "almost perfect" (kappa coefficient > 0.81), as globally standardized for kappa coefficient evaluation 18 , for both inter-observer and intra-observer agreement.The low variance revealed the high reproducibility of measures obtained by a single observer across time and between multiple observers.Furthermore, the sensitivity and specificity as revealed by ROC analysis were satisfactory.
Compared to studies assessing the consistency of visual evaluation of SN features 14,19 , we found higher interand intra-observer agreement using the newly developed application.In previous studies using visual evaluation, inter-observer SN echogenicity correlations ranged from r=0.55-0.82,while area measurement correlations ranged from r=0.31-0.74.Furthermore, intra-observer echogenicity and area measurement correlations (r=0.85-0.96and r=0.51-0.69)were statistically significant (P<0.001)only for experienced sonographers 14 .In the study of van de Loo et al. ( 2010), inter-observer correlations for SN area measurements ranged from r=0.84-0.89,inter-observer correlations of SN echogenicity from r=0.33-0.51,and intra-observer correlations for planimetric measurements and echogenicity evaluations from r=0.93-0.97 and r=0.74-0.80,respectively.The results of both studies showed that semi-quantitative evaluations of SN echogenicity and planimetric area using TCS were highly dependent on the experience of the sonographer 14,19 .Only an experienced sonographer was able to generate reproducible results with statistically significant correlations.Objective computer evaluation of SN features could overcome these shortcomings of visual TCS evaluation.
Using contemporary ultrasound systems, two-dimensional real-time images of midbrain structures are now possible with high spatial resolution.TCS allows depicting characteristic abnormalities in the SN echogenicity by visual inspection of a series of images 20 .However, the sonographer cannot be fully blinded to the clinical features of parkinsonian patients.Thus, "blinded" evaluation of SN represents the main advantage of its automatic detection.
While automatic evaluation procedures for ultrasound images may greatly benefit diagnoses of neurological disorders for which there is an applicable bone window, technical problems have limited wider use of these applications.According to the authors' knowledge, this is the first program for automatic evaluation of brain structures using B-mode.Only some computer programs for intimamedia thickness measurement are applicable in general practice, e.g.Software for Rapid Measurement of Carotid Intima-Media Thickness (CIMT) from B-mode ultrasound data (Arizona Technology Enterprises).Analysis of CIMT can aid in the evaluation of cardiovascular risk and has the potential to predict future myocardial infarction and stroke.This CIMT measurement software was compared to other commercial software packages, and the correlation coefficient was > 0.85 (P<0.001)(ref. 21), similar to our results for inter-observer correlation (r=0.88).A similar application for CIMT measurement based on DICOM ultrasound images has been developed by Potter et al. 22 .
Several limitations of the present study should be mentioned.While we found a high inter-observer correlation, it should be emphasized that visual analysis was still performed by an experienced sonographer.A second limitation of the application is that raw image quality is still dependent on the initial acquisition settings of the ultrasound machine (in addition to proper computer postprocessing).Furthermore, the application does not overcome the technical difficulty in acquiring a high quality raw image of the SN, where the role of an experienced sonographer is irreplaceable.A possible error by placement of the ROI to encompass the SN should be also mentioned as the anatomical boundaries separating the SN from the surroundings might be poorly defined on ultrasound images.Nevertheless, the ROI-2 area was approximately twice larger than SN area in all cases.
Future work will focus on using an artificial neural network to automate recognition of the SN within the sonographic image.This will facilitate better processing and reduce observer workload.The next goal is to achieve fully automatic processing.Furthermore, the application must be tested on other brain images and other image formats to fully validate this application for clinical practice.

CONCLUSION
We developed a GUI-based standalone application for morphometric evaluation of SN features.The results of the present study demonstrate the high repeatability and reproducibility of SN measurements with "almost perfect" inter-observer and intra-observer agreement.

ACKNOWLEDGEMENTS
The study was partially supported by a grant of the Moravian-Silesian region of the Czech Republic and a grant of the Silesian University in Opava, Czech Republic.

Fig. 5 .
Fig. 5. Dependency of automatically measured area inside ROI-2 on gray scale intensity -minimal and maximal computed area for all images and 95th percentile for normal (< 0.25 cm 2 ) SN images.

Table 1 .
Demographic data of patients.

Table 2 .
Comparison of evaluation of substantia nigra pathology using visual and automatic evaluation.JB, TS: statistical analysis; RH: final approval.Conflict of interest statement: None declared.