Build your marketing data warehouse with Google BigQuery


10 months ago
Deborah Bergiers

Like many marketers, I’m sure that your marketing data comes from several platforms and you have to juggle with a mix of siloed data to be able to analyze and understand your customer journey. Quite annoying, right?

 

Siloed data coming from different sources rarely communicate with each other and make the interpretation of KPIs and important metrics complicated. In the end,  it leads to situations where you don’t know how to make decisions…

 

That’s why, more than ever, you need to take control of your data and gather it in one place. Generally, the main solution is to use spreadsheets, but spreadsheets can’t handle huge datasets. This is precisely where a data warehouse like BigQuery becomes interesting.

 

BigQuery & how it will meet the needs of your business

What is BigQuery?

BigQuery is a data warehouse from the Google Cloud platform that helps you to store, visualize, analyze and ingest your data. The goal is to directly upload all your data in real-time from various sources: CRM, Google Analytics, Google Sheets, Google Ads, Google Optimize, Facebook Ads,… The objective is to centralize all your raw data in the cloud.

 

In terms of pricing, the model is quite simple. You’ll pay both storage and processing costs. Storage costs only depend on the volume of data stored. For processing, you will pay for querying data, data storage, and streaming inserts. Loading data is free of charge. 

 

BigQuery provides free tiers for storage (10 Gb) and processing (1Tb). You only pay above those thresholds. For more information about pricing, you can check the overview of BigQuery pricing.

 

Google BigQuery advantages

By being scalable and fast, Google BigQuery enables you to only focus on analyzing your data and allows large-scale analytics. As it’s connected to intelligence tools such as Data studio, it helps you to easily visualize your data in a clear and centralized report to get clear insights about your customer journey and get predictions about your clients for example.

 

BigQuery ML allows you to create and execute machine learning models directly on the data warehouse. You can create regression models, K-means clusters, decision trees, or time series just by a “CREATE model” query.  It saves loads of time and makes it handy to have data and models managed in the same place.

 

BigQuery alternatives

Of course, BigQuery is not the only data warehouse. Amazon ( Amazon Redshift) or Microsoft (Microsoft Azure Synapse Analytics) also offer similar services.

 

If you use BigQuery or another solution, it will be a decision based on your project: type and structure of data, purpose (machine learning ?), or endpoint (dashboarding, data platform,..). But also current technology stack: are you already using Google Cloud Platform or Google data such as GA 4. “Multi-cloud” (aka using multiple cloud solutions) is often possible but brings development and sometimes complications.

 

Why use Google BigQuery?

The main question is: why should Marketers start to store their data in this cloud and how to use it?

  • Unlimited & historical data: As previously said, unlike spreadsheets, with BigQuery you can gather all your marketing data from all your platforms in one place to have a single source of truth for your performance. It will reduce and simplify your data operations. Moreover, you can access all your historical data.

 

  • Fast and accessible insights:  BigQuery uses a dialect of SQL ( Structured Query Language) requests which helps you to get your data up and running in minutes. Thanks to that, you get results really fast and tee up real-time insights

 

  • Granularity and visualization: by uploading your Google Analytics data you will be able to analyze your events on a granular level. For example, you can define the most common page paths of your users and see the difference between the purchasers and the visitors. You can really dig into your data which is not possible within the native interface. Thanks to the connection with Data studio, it will also make your data more visual.

 

When NOT to use Google Bigquery?

If your project does not match with Big Data 3 V’s:  if your data are not big in Volume (Size), doesn’t require Velocity (speed of execution) or are not Varied enough, maybe you’ll find other solutions within Google Cloud Platform that fit best with your needs (for example Google Sheets or Google SQL).

 

Don’t forget Google is a US company: they have no GDPR adequacy decision from Europe and are not under the protection of Privacy Shield (invalidated by EU in July 2020). That means uploading personal data from EU citizens into Big Query without data protection assessment should be considered as a risk (by the way, when playing with big data, a privacy assessment is always a good idea, US company or not: am I respecting the privacy of my customers, do I process sensitive data like health/opinions/sexual orientation/…, how do I respect the right to be forgotten, etc..)

 

How to start a marketing data warehouse

1) Define your objective: What do you want to use Bigquery for?

No, “I want everything” is not a good answer (remember, you have a data protection assessment to fill, the purpose of processing is the first chapter).

 

You need to think about what you want to reach and the value you want to unlock. Is it a prediction of your customer’s behavior (churn, acquisition, ..), optimizing website visitors’ behavior, CRO, information about a specific category, enriching your CRM with Analytics data (or the other way around), optimizing relevancy, doing marketing automation,… Describe exactly what you need to do, be very specific, and evaluate the value (in terms of return or time saved) for each purpose.

 

2) Prepare your plan

The second step is to map all your data required to achieve this goal (automation tools, CRM, Ads platforms,…). Class them between “need to have” and “nice to have”.

 

Focus on needs to have, and check if your data sources have a public API or if automated data export is enabled. If it’s not the case you will have to find another tool. In marketing data,  Supermetrics extracts data from your marketing platforms and uploads them into Big Query.

 

3) Import and process data: in which order?

There are two main processing models :

ETL vs. ELT

  • ETL (Extract – Transform – Load): processing occurs while the importation of data. If you want to enrich, merge, filter, parsing naming convention,….  you can do it when you import it. You’ll save storage costs: you only store what you need. It also has benefits on GPDR compliance: the data is often aggregated at this level, the data is not personal anymore… 

 

  • ELT (Extract – Load – Transform): In this case, you can transfer data as they arrive, and process it while you need it. It is often the case in the IoT industry: data arrives continuously and you don’t have time to process and store it in the same process. You store first and process after.

 

BigQuery fits very well with the ELT paradigm. As BigQuery is very scalable, you can handle huge transformations on a big amount of data. That’s why Google Analytics 4 works best with BigQuery: data flows into BigQuery and you can make your analysis after. See below for a tutorial on how to import your GA4 data into BigQuery.

 

In the case of ELT, the transformation is done via SQL on BigQuery. You’ll find handy the documentation of bigQuery query syntax. Long story short: it is very similar to “traditional” SQL, with additional concepts such as ‘nesting’ : a table can be inserted (“nested”) into another table.  That means you’ll have to “unnest” nested data in a record before using it.

 

If you’re familiar with SQL, you’ll also need to review the schema and especially the partitioning part. The way you organize your data is very important: remember, you pay per amount of data processed. If you tell BigQuery how to organize (“partition”) your data (per day, per site, ..) BigQuery will process only the data needed, and then charge you less.

 

4) Visualise

Once your query is ready and optimized, you’ll have to visualize the results somewhere. There is no shame in using Google Spreadsheets to run the “last mile” of the analysis.  Spreadsheets has built-in connectors (Data -> Data connectors -> Connect to BigQuery)

 

The most obvious choice is Data Studio, you connect to your view or table, and create a dashboard. Search for BigQuery in the connectors. Choose your project – table, and enjoy the full GA4 schema.

 

BigQuery - add data to report

BigQuery - add data to report

 

Note other visualization tools like PowerBI or Tableau can also connect to BigQuery.

 

Google BigQuery use case: A process mining approach to consumer journey

At Clicktrust, BigQuery allows us to consider the consumer journey as a process. We design steps the consumer should take, from product discovery until the purchase, which defines the completion of the process.

 

We collect and store “events”, analyze the performance of the process, a little bit like a supply chain company would do  : 

  • Process discovery: what is the order and importance of each step. How users consume our websites (which pages, what do they search,..), where do they come from, what “path” converts better, …

 

  • Conformance checking: is my mental model aligned with reality? Do “awareness” and “conversion” landing pages really realize their respective jobs?

 

  • Performance: what are bottlenecks, where people stop their journey, and what elements favor conversions, on a user-level…
Conversion flow

Conversion flow – Performance (process mining framework)

 

To realize this, we can add different layers:

  •  Google Analytics 4 data – the angular stone.  With GA4, you’ll benefit from user & log-level data. Optionally, you can push the “customer/CRM ID”, it will be visible on BigQuery as “User ID”. Therefore it will be available to connect with offline or CRM data, and you’ll have a complete view of who came on the website, triggered which events, and bought which product.

 

  •  E-Commerce:  If you are an e-commerce store that uses Shopify, dump your “orders” and “transactions” and “customer” data to have a view on churn or LTV through reports. You can segment your customers and cluster them to address them with the right promotion at the right moment. You can also analyze other sources of data and get clear insights about your key metrics and how they correlate with other dimensions (LTV per age/source/product/campaigns/location).

 

  • Mailing: who did you send your offers to? Did they click? That’s also the first block of marketing automation: your data flows to the same place: set rules and trigger 

 

  • The more you add internal sources, the more your view of customers is complete. In the end, you got a “golden record” of your customers: every interaction is stored at the same place and available in the same process.  Adding individual data about paid media is touchy and brings unnecessary GDPR risks. But you can still anonymously make SEA – SEO optimization: are SEA & SEO keywords used the same way by your visitors? BigQuery offers you the power to merge SEA & SEO keywords on a daily basis, by device and page.

 

To conclude, BigQuery (or another data warehouse) will help you to take advantage over your competitors by enabling detailed and in-depth analysis. With ITP, IOS 14, the death of cookies,… It’s more than essential today to find alternatives to continue to understand the needs of your customers to make sure they continue to meet the needs of your business. 

 

To learn more about the death of cookies, we recommend you to read the article on the death of cookies and how to activate CRM data.

 

Finally, if you’re interested in knowing how to put all of this into practice, stay tuned, a second article is coming soon.

Top