To win in AI: Get your data architecture right

The Modern Reporting set-up for Marketers

By: Simon Wiggins. Director of Data Engineering, CvE.

Data and Reporting are always hot topics with Marketers. With all the new, exciting pieces of marketing technology like the rise of AI, one of the most important questions that should always crop up is “How do we measure the success of that?”

In the last 10 years improvements in web technology have drastically improved the access and usability of data, resulting in a competitive advantage for companies and individuals that can harness the power of their data effectively. This advantage becomes even more salient with the increased fragmentation of the “dataverse” caused by walled gardens, cookie deprecation, privacy concerns and regulation. Defining a robust methodology of data collection, storage and reporting is the key to forming an answer and keeping up with or overtaking competitors.

An often overlooked part of setting yourself up for success is also the people and skillset required to build and maintain such a system. In the past, a lot of the technology within a company has sat with IT teams. However as technology has become more readily available and the use cases specific to data or marketing, it becomes more important to have the right people with the right abilities in place. A kind of “shadow IT” team that is technical enough to understand setting up and configuring the tools but with the deep understanding of the data it generates along with the context to translate that data to the business and the use marketing data to support the businesses goals.

Data cleanliness is integral to making this system efficient and effective, ensuring you have cleanly labelled datasets with well defined hierarchies and naming conventions. The biggest problem most clients face is having an overly complex layer of corrections and adjustments so that the data makes sense. The more tweaking of the data that’s required to get it into a useful place the harder it is to automate reporting robustly.

With the rise of AI the importance of data cleanliness becomes even more obvious. In order for AI to use a dataset it must be clean and well labelled to train a model. If you can’t understand the naming convention or it requires some context or translation from a human then no model will be able to glean any trends or insight from the data accurately.

Data Collection

For data collection these days most platforms come with some sort of API (Application Programming Interface) that allows the end user to connect to the platform and pull out the reporting stats. Although, a lot of these have similarities they are custom to the platform and can require constant updating and tweaking to keep the data flowing. For this reason applications like Funnel.io and Adverity exist specifically to connect to Marketing platforms API’s and collect that data automatically.

Ensure you have the level of granularity and metrics required to answer your questions but remember that each dimension increases the size of the dataset. The simplest way to do this is to pick a few examples used in the business and combine them.

Storage

The next step is to land that data somewhere that it can be mastered and overlayed with any business logic, each use case is custom but some general guidelines can help inform that decision making. Does the business already have some infrastructure in place that you can lean on? Is there a cloud provider already chosen by the business for other projects? Do you have an on premise database system you can push data to?

An advantage of using a cloud provider here would be that you don’t need to worry about managing or configuring the database yourself so you can concentrate on getting the data in and building the translation on top of it. Each of the big 3 cloud companies (Google, Microsoft and Amazon) have also excellent reliability as they use the tools themselves to support their own infrastructure. Another consideration here is a tool like Snowflake which can sit on top of any of the cloud providers and handles a lot of the intricacies of access, backups, disaster recovery and scalability. It also has a sharing feature meaning you’re able to share datasets across snowflake instances without requiring full access, this can be useful when sharing data internally to other teams or externally to clients/partners.

Reporting Views

Once you have your raw data into a database of sorts, this is where you can start to utilise the data. In most databases, data is stored in “tables”. Think of this like an Excel spreadsheet with rows for each line of data and columns for the dimensions and metrics. If you’re sent a massive spreadsheet of granular data, this can be next to useless if all you need is a metric like total spend per day, so the first thing you’d likely do is summarise that data in a pivot table. Databases have an equivalent of this called a “view” this is where you can define a “view” of the original “table” that summarises the data how you like.

A view can be as simple or as complex as you make it, meaning it’s ideally suited to combining multiple raw tables into a single simple view to show the business data you need at the correct granularity. The view will automatically update when new data gets pushed into the table meaning you can use a view to see the stats you need on an ongoing basis. So if you have raw data from Meta, Snapchat, Twitter, Tiktok and Google Ads, it’s possible to write a few lines of code (usually SQL) to pull the spend and important metrics from each data table into a single view ready to be visualised and reported back to any stakeholder.

Speaking of visualisations, there are a tonne of good options on the market including Salesforce Tableau, Microsoft PowerBI and Google Looker Studio (formally Google Data Studio). Looker is a free resource so a great place to start as dashboards can go through some iterations as business users discover that having stats readily available every day changes which ones you actually monitor. Alternatively PowerBI being a Microsoft product might fit perfectly into your Microsoft ecosystem if you’re already embedded.

Resource

If you’ve been reading this and some of the terms are alien or you’re unsure what SQL is, then chances are you might need a data engineer to help you out. With Data being such a commodity these days, not just in the marketing world, it’s imperative to have a strong team of Data engineers to build, deploy and maintain this reporting system. This concept of ETL or ELT (Extract Transform Load, Extract Load Transform) system is not a new idea in many industries however marketing has always been slow to realise the power and scalability of investing in it. Finding a competent data engineer is possible but ensuring they have an in depth understanding of marketing and all the metrics across martech platforms can make it a bit of a unicorn role to fill so this could be the hardest part of the whole system to solve. Luckily CvE are here with just that team if you need a hand getting started on your journey!

So what does good look like for a modern reporting setup?

As I mentioned before, it’ll vary drastically from business to business. But, the important parts are having a robust and consistent data ingestion method that lands it in a central database, tooling with a team translating the business requests into summarised reporting views. One of the most overlooked parts of this setup though is the data cleanliness, extra care should be taken to ensure any data entered in the marketing tools and technology is done correctly first time without spelling mistakes or placements in the wrong campaigns. All of these small inconsistencies at the start of the process cause a butterfly effect through the system meaning the data at the end could be inconsistent and ultimately cause the wrong conclusions to be drawn from it.

Conclusion

Hopefully this article gives you a great starting place or maybe signposts a next step in your journey. It is by no means a simple one but there are some key basics to ensure you’re on the path to success. One of the key takeaways I would like to get across is that each part of this stack is a component that can be changed later on, so you could start with Funnel as an aggregator then later choose to switch to Adverity, starting with Google Cloud Platform (GCP) and Big Query but later choose to migrate your data into Snowflake, build a free MVP in Looker then later take advantage of PowerBI or Tableau’s more mature platforms later.

Questions?

Please email me at swiggins@controlvexposed.com and I will be more than happy to listen and help your business win in AI by getting your data architecture right.

News & Blog