Two of the major challenges Collecting and Distributing Royalties in the Copyright and Music Industry are Data Quality and Cost Efficiency. Having a centralized repository of recording data shared by many organizations with a high level of quality has been very challenging. This is because managing, monitoring and controlling the data source based on the level of trust from the data provider is more daunting than expected.
The key objective for building a global repertoire database is to increase the quality of data, completeness of data, and accuracy of data to be shared among stakeholders.
I was part of a great team who successfully built a global repertoire database in the recent past years for a music industry client. As the architect of this solution, I learned a lot about the music industry and licensing through this experience. In retrospective, and as someone who is constantly learning about new technologies, I would like to share what would I have done differently if I were to build a system like this again:
1. Increased Matching and Deduplication Accuracy using Machine Learning:
Matching and de-duplication of metadata was the core of this solution. Instead of rule based matching, I would have implemented a service that can match recording using machine learning algorithms. Why? Many of the rules were developed based on human understanding of the data – that will always change and the rules need to be updated constantly. A mathematically modeled machine learning algorithm can give a better result, performance and higher rate of accuracy. We have built this same exact model and proved its accuracy above 99%.
2. Increased Performance by moving away from Relational Databases to Graph Databases:
What is the best way to represent the data and store it? I think we have a mental block from moving away from traditional data store to NoSQL or Graph-based database due to the ACID nature, tooling and maturity of the traditional relational database. The new breed of data store solutions are a more performant way to represent the data that would give us the ability to provide predictive and recommendations; capabilities faster with less computational power. A graph representation is probably one of the best ways to relate the entities in the music industry, however, there are other consideration such as existing application ecosystems and the interoperability, skills and tools to be made before making that decision. A repertoire data is essentially a network of relationship between recording, right holders, product, track, album, etc., so a graph is a perfect way to represent this relationship.
3. Reduced Infrastructure Cost and Increased Flexibility using Cloud Technology:
Last but not least, a solution that could help the Capex and Opex calculation more accurately per stakeholders and their usage would have been ideal when multiple stakeholders that are big and small share the cost. A solution built on a Platform as a Service(PaaS) such as Google Cloud, Azure or Amazon Web service would have provided a better control and accurately distribute the cost based on the usage. A range of products and services that are available can also provide a consistent, elastic and reasonably performant solution. The advent of serverless platform also could help the organization to focus on the business functions rather than infrastructure and enable them to have the agility, time to market and reduced capital expense.
While these concepts could have greatly improved the accuracy, performance, and cost effectiveness of the solution our team built, they can also be applied to other distribution and music royalty systems to help Copyright and Collective Management Organizations in the music industry be more effective and move ahead in today’s fast paced technology landscape.