Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.
ASJC Scopus subject areas