THANK YOU FOR SUBSCRIBING
A Universal Data Model for Building Personalized Applications
By Sixtine Vervial, Head of Business Intelligence, JustWatch
The challenge lies in building the simplest yet most comprehensive data model that will serve for the three main types of data mining activities: static reporting, ad-hoc analysis, and data products. Having a replicable data model also contributes to developing a standard across the company or the industry, fostering future data exchanges between parties. At a high-level, most online businesses can be described as people (users) interacting with goods or services (products).
The challenge lies in building the simplest yet most comprehensive data model that will serve for the three main types of data mining activities: static reporting, ad-hoc analysis, and data products
Typically, product details will be stored in an ERP or similar backend system, together with static user data. Interactions are recorded via events trackers such as Google Analytics, Mixpanel, Snowplow. The ultimate goal is to centralize and sanitize those datasets to obtain comprehensive datasets for future analysis.
Keeping the interaction table as atomic as possible is critical. Two foreign keys to the product and user dimensions, maybe a category and subcategory for classifying interactions. As for products and users, go wild! The more attributes, the better, as long as they are mostly immutable.
Starting with acquisition, conversion and retention KPIs, this model offers a solid base for target metrics reporting. The use of SQL and an open-source visualization tool (Redash, Superset, Metabase) will unlock the first trends and enable precise monitoring of each feature.
Mining data further with Python, data scientists will be able to create services for profiling and delivering the personalized message to each user. For instance, applying to cluster on user profile and engaging events, data scientists will be able to derive user personas, for understanding different types of audiences and adapt marketing strategies or even offer specific features. After the acquisition, collaborative filtering will help to create appropriate product recommendations for both up-selling and retention. Finally, leveraging a reporting interface like Redash or even Google Data Studio, one will prioritize customers to be re-engaged and guess the next best-suited product.
Although one could elaborate further on technical setups variations and best practices in terms of storage and data processing, data modeling is a conceptual task needed long before going into machine details. Your first step? Identifying and locating relevant products, users and interactions attributes and precisely define performance metrics goals for guiding the development of models and data-driven applications.