Some data’s more equal than others

As machine learning has matured, ML algorithms have become commoditised and therefore it is the data that is valuable. Or so the argument goes.

This has a nice ring to it. It sounds true.

It has the convenient benefit of aligning with the strengths of many pre- or early-internet era incumbents, and neutralising their weaknesses in ML. That alone should give you pause.

Data is a briefcase word. It means many things.

The data that matters for that machine learning application you have in mind only partially overlaps with the data you've been passively collecting.

Where the data does overlap, the data is messy, incomplete, inaccurate. Even the flashiest algos will struggle under these conditions.

The data is probably difficult to access. Like all product development, machine learning products are powered by iteration and experimentation. Without easy access, your use of data will be pedestrian.

Screenshot 2020-07-04 at 15.01.24.png

To be an AI powerhouse is to be a data powerhouse. And that means having an insatiable appetite for hoovering up all kinds of data, at any and every opportunity. It means finding new ways to digitise information flows that didn’t previously exist. And to build walls around that data collection.

This appetite, married to algorithms that create feedback loops to the data itself (inferring from the data, added to the data, correcting from the data, using the data as labels) is what builds a data flywheel.

A data flywheel is made up of lots of data. Lots of data is necessary but not sufficient.

So when you say "it's not about algorithms, it's about data", can you please explain what your data looks like.

— — —

Previous
Previous

Prototyping is Thinking through Making

Next
Next

Affordances and AI