Our user interface for collecting annotations shows the user
an image and asks them, for a particular pair of pixels
(indicated with crosshairs and labeled Points 1 and 2), which
of the two points has a darker surface color. The user can then
select one of three options: Point 1, Point 2, and About the
same. We ask users to specify their confidence in their
assessment as Guessing, Probably, or Definitely, as was done by
[Branson et al. 2010].
We aggregate judgements from 5 users for each pair of points
and use the CUBAM machine learning model [Welinder et al. 2010]
to model two forms of bias.
The input image is decomposed into a "reflectance" and "shading" layer. Note that the reflectance layer is listed twice: color (left) and grayscale (center). Decompositions are ordered by error and then runtime (best on top). The parameters for each algorithm are the same for all photos; they are set to the values that produce lowest mean error (WHDR) for all photos. See our publication for more details.
Algorithm: grosse2009_grayscale_retinex
Reflectance
Grayscale Reflectance
Shading
Citation:
Roger Grosse, Micah K. Johnson, Edward H. Adelson, William T. Freeman. "Ground truth dataset and baseline evaluations for intrinsic image algorithms". Proceedings of the International Conference on Computer Vision (ICCV). http://www.cs.toronto.edu/~rgrosse/intrinsic/.
Parameters:
L1: True
threshold: 0.5
Result:
Weighted human disagreement rate (WHDR): 25.9% (δ: 0.1)
Qi Zhao, Ping Tan, Qiang Dai, Li Shen, Enhua Wu, Stephen Lin. "A Closed-form Solution to Retinex with Non-local Texture Constraints". IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI). http://www.ece.nus.edu.sg/stfpage/eletp/Papers/pami12_intrinsic.pdf.
Parameters:
chrom thresh: 0.001
gamma: False
texture patch distance: 0.0003
texture patch variance: 0.03
Result:
Weighted human disagreement rate (WHDR): 26.1% (δ: 0.1)
Elena Garces, Adolfo Munoz, Jorge Lopez-Moreno, Diego Gutierrez. "Intrinsic Images by Clustering". Computer Graphics Forum (Eurographics Symposium on Rendering). http://www-sop.inria.fr/reves/Basilic/2012/GMLG12/.
Parameters:
km k: 8
remap gamma 2 2: False
Result:
Weighted human disagreement rate (WHDR): 26.8% (δ: 0.1)
Roger Grosse, Micah K. Johnson, Edward H. Adelson, William T. Freeman. "Ground truth dataset and baseline evaluations for intrinsic image algorithms". Proceedings of the International Conference on Computer Vision (ICCV). http://www.cs.toronto.edu/~rgrosse/intrinsic/.
Parameters:
L1: True
threshold color: 0.7
threshold gray: 0.5
Result:
Weighted human disagreement rate (WHDR): 27.8% (δ: 0.1)