NLP development and evaluation requires tools to compare annotations, analyze differences, and compute accuracy metrics.
None of the few existing tools allows consuming the output of any NLP system, comparing it with reference annotations in any format (both with configuration), setting comparison rules, counting matches and computing accuracy metrics, visually displaying matching and erroneous annotations, changing the displayed information for analysis on the fly, exporting evaluation results, etc.
ETUDE (Evaluation Tool for Unstructured Data and Extractions) is a collaborative development by the MUSC TBIC (Paul Heider, for the engine) and by Clinacuity, Inc.
It offers the following functionalities:
- Import of annotations (system output and reference standard) in multiple configurable formats
- Drag-and-drop creation of evaluation configuration files (i.e., specification of annotation categories and attributes to match and compare)
- Customizable annotation comparison settings (base includes exact, partial, and fully-contained matches)
- Count and comparison of annotations, computing descriptive statistics, a confusion matrix of matches and mismatches, and computing accuracy metrics (recall, precision, F-measure)
- Show side-by-side reference and system annotations in the original document context with easy exploration of matches and errors
- Export of evaluation results in various configurable formats