Home
A Music Metadata Expert Explains Proactive Data Quality

A Music Metadata Expert Explains Proactive Data Quality

admin2 months ago83 Views

It’s time for the music industry to shift from endless data clean-up to a strategy of quality at the source, and transform data from a liability into a reliable asset.

The following comes from Natalie Jacobs of Equalizer Consulting, part of a broader partnership with DMN.

Several years ago, someone on my team estimated that it would take 107 years to clean up a primary data field that was applied to several million assets. This was based on available personnel and limitations that might prevent automation of the task. He had put this number in an email, which made its way to the higher echelons (C-suite) and ultimately plagued us for months because, as ridiculous as it sounded, it became a benchmark for data clean-up efforts.

While this example seems extreme, it’s true that data cleaning efforts are lengthy, resource-intensive, and don’t always lend themselves to automation. The incredibly creative nature of our industry, along with very few standards, and inconsistent application of the few data standards that we have (e.g. work identifiers), means that technical solutions applied to data cleaning are often more about reduction in volume than full resolution of the extensive data mess that we’re all aware exists.

By this I mean that it is possible to put some technical parameters around a solution, but music metadata isn’t binary – there are many grey areas which pose a problem for clean-up. Think of it as a sliding scale:

- Very confident: data is correct, complete, and adheres to defined rules
- Incorrect: data is of poor quality
- Everything in between: usually falls to some level of manual review by humans

The need for human input to review and correct data creates a process that is time-consuming, expensive, and doesn’t scale. Automation may help with reducing some of the volume, but it’s hard to eliminate the human element entirely.

The speed at which we are creating metadata far exceeds the speed at which we can correct existing errors; therefore, perhaps it means we should be rethinking our strategy.

What if, instead of focusing on clean-up, we were to take a more proactive approach to producing complete, correct, and consistent data in the first place?

The need to clean data will persist, but it’s far better to prevent data from needing to be cleaned than to try to clean it after the fact.

To get the entire industry, including the independent sector, on board with consistent application of standards would be a huge undertaking. That being said, if we break down a proactive strategy into bite-sized pieces and control what we can control, then plenty of progress can still be made.

Let’s outline some of the fundamental benefits of a more proactive approach, which isn’t reliant on never-ending cycles of data cleaning.

The exponential cost of rework

Cleaning data is a form of reworking it. Once the bad metadata has entered the system, it proliferates downstream, impacting other data sets and becoming increasingly difficult to locate and correct. This means more time and expense with labor, as well as opportunity cost. By building a robust, front-loaded data strategy, you eliminate this rework and data remediation.

Garbage In, Garbage Out

This isn’t a trope without reason. Instead of accepting that “garbage” will enter your systems, one could proactively apply the principle of quality at the source. When data is complete, consistent, and correct at the beginning, you don’t have to worry about restoring data integrity through extensive cleaning cycles. You can feel confident in a well-functioning and reliable ecosystem.

Reduced risk for decision-making

Making decisions on flawed data can be very risky, potentially leading to issues such as audits, royalty corrections, and missed opportunities. By increasing confidence in high-quality data and ensuring that it is trustworthy from the beginning, one can access insights both more quickly and confidently.

Retroactive cleaning is a perpetual cycle

One can’t get ahead of a dirty data set when additional dirty data is perpetually being added. By building automated validation rules and data entry protocols, the data becomes self-cleaning. Errors and anomalies can be identified and corrected before they enter and pollute the ecosystem.

Prevention improves capabilities at scale

The more data you collect, the more complex and time-consuming it can be to clean it. Even where automation can be applied, there is likely a proportionate increase in manual review. When clear data governance and validation guidelines are applied upfront, it allows growth without a corresponding explosion in cleaning efforts.

I’ve seen many instances within a single company where data is held inconsistently between systems, teams in different departments aren’t using the same data standards, and there simply isn’t enough communication and documentation around a singular strategy for employees to adhere to.

When I create a strategy around data operations, I look at both upstream and downstream data flows, identifying the creation points, the outputs, delivery timing, inconsistencies, assessment of what can and can’t be automated, and so much more. It’s also essential to establish what one has control over (such as issuing an ISRC or how “version” is captured for a song title) vs where one is reliant on a third party (such as issuing an IPI). Implementing a proactive data quality framework requires expertise in both data operations and strategy, but it is invaluable work.

Ultimately, shifting from a reactive “clean-up crew” mindset to a proactive “quality assurance” strategy transforms data from a liability into a powerful asset.