Decision-boundaries and super-labellers
Super-labellers are the untapped resource of algorithmic improvement.
For a given AI problem there are two fundamental sources of possible improvement: the algorithm or the data.*
In data, more is better.
Quantity has a quality all of its own. This unreasonable effectiveness of data was true in 2011 when Peter Norvig pointed it out; it's truer today.
The phrase itself is deceptive, however. Most use-cases do not lend themselves to increasing the input data by orders of magnitude. Data must be labelled. And labelling is expensive. Humans have to do the work and humans must be paid a living wage.
In practice, ‘Just add more data’ is rarely the answer. We must decide where to spend on labelling. While data, in aggregate, is a commodity - it is not a uniform commodity. Some data is more equal than others.
Do we emphasise the most important use-cases, the most used use-cases, those use-cases prone to catastrophic error? We can have all three if we are particular about which data points in feature space we choose to label.
Within each labelling task, there exists a subset of labels that are most valuable. The most valuable data is data close to the decision-boundary. Unique examples that help to define the line between one class and another.**
Invest in decision-boundary data.
To invest proportionately more in the decision-boundary data; we need to find examples that we would expect to be ambiguous. Examples where labellers disagree. Examples where we know models have gotten them wrong in the past. Examples where user feedback disagrees with our own decision-making or indicates there’s nuance we haven’t captured.
This allows us to identify the population of data from which to label. We have immediately saved time and effort. And cost.
But there's more.
If these areas are difficult, ambiguous and contested, then we may simply be feeding the algorithms with noise. This is especially true if we consider that most large-scale labelling is carried out in developing countries by non-expert labellers without full cultural context of the data and use-case.
So we need labellers who are themselves more accurate; more discerning. Who have a sixth sense for this specific labelling problem.
These super-labellers exist. As in most domains, there are individuals who are substantially better than their peers. Labellers that intuitively know which side of the boundary this example should sit. Regardless of whether the labelling instructions deem them to be so or not. ***
We want super-labellers.
How do we get them? We seek them out. We nurture them. We train them.
A finely tuned training program that interrogates labelling decisions in detail to find out exactly what constitutes the boundary. This goes well beyond traditional Quality Reviews. Training programs where trainers help super-labellers to build their intuition. To listen to it. To feed it back to rules-writing and Policy.
At the organisational level, encouraging labellers to hyper-specialise and become niche subject matter experts. Far below typical SME focus area, we are looking for these labellers to inhabit the feature space and understand it at a deep, fundamental level.
A super-labeller interrogates the decision boundary. Probing and refining, hungrily seeking out examples close to it and feeding that back to our policy teams and to other labellers.
And ultimately to the algorithms.
Notes
* Here, we’re focusing purely on the AI domain. We could also improve via problem framing, UI/UX.
** “Line” used here very loosely.
*** Labelling tasks are outlined in an SOP, where the designer will need to identify edge-cases and rules that govern those edge-cases well in advance. These are not always accurate at the boundary.