Auto Trader Engineering Blog
-
Dry running our data warehouse using BigQuery and dbt
In this post, we talk about how we use dbt & BigQuery dry run jobs to validate our 1000+ models in under 30 seconds.
As mentioned in a previous post, at Auto Trader, we use dbt (data build tool) to transform data in BigQuery, our cloud data warehouse. Ensuring the data in our platform is accurate and available on time is important for both our internal and external data consumers. When we build a dbt project it compiles our models—templated
select
statements which define the transformations to our underlying source data—into SQL that is executed against the data warehouse.While dbt is an excellent tool for creating these complex transformation pipelines, it does not check that the
select
statements are valid SQL. The current solution for this is a CI environment that executes your project and runs data tests to check the transformations are working as expected. The main drawbacks of this are speed and cost as the database engine you are using needs to execute SQL in a production-like environment. -
Experimentation at Auto Trader
Experimentation—or ‘AB testing’—is a vital part of adapting to the ever-shifting behaviour of users on the internet. In this brief (albeit hopefully informative!) blog post, I will be covering how we’ve historically approached this topic at Auto Trader. I’ll then go on to describe the changes we’re putting in place to make it an even more effective tool in our kit as we move into the landscape of digital retailing. Finally, I’ll go through how we envisage the process to look in the future.
-
How Auto Trader ensures end-to-end data trust at scale
At Auto Trader, we make use of several tools to store and maintain our data. The majority of our analytical data is stored in BigQuery, a serverless cloud-hosted data warehouse. From here, we use dbt to transform and model that data.
Inevitably, things can go wrong with the tasks involved in building the datasets that our stakeholders rely on. We’re always looking for ways to automate and better operationalize our data workflows. We want to be confident in the quality of the data that we’re sharing, which is why it’s important to know when a dataset doesn’t look quite as we expect.
An important feature of our data pipelines is observability, as the complexity of data workflows can otherwise make it hard for us to investigate quality issues. To help with this, we’ve adopted Monte Carlo as a monitoring tool, which provides a rich UI to set up notifications and custom domain-specific checks on our data. In this post, we’ll explain how we have automated the creation of Monte Carlo notifications and embedded them into our existing data development process.
-
Auto-generating an Airflow DAG using the dbt manifest
This post explores our continuing effort to improve our developer experience and ability to respond to incidents. Here, we discuss how we made scheduling dbt tasks simpler and more transparent, removing the need for the dbt user to consider scheduling when deploying a new model.
-
AT Design Talks
Back in January, our design team hosted our first AT Design Talks event, which featured a wide range of lightning talks by members of our design and research team. We covered a variety of topics, from the process of building out data visualisations for our customers, to how we use eye-tracking to understand the impact of information architecture on consumer scanning patterns.
-
Single Sign-On and Basic Auth with Spring Security
The requirements for some of our apps have led to interesting explorations of the Spring Security configuration. I hope to show you by way of example how non-standard implementations can still be achieved with elegance once you understand the Spring Security architecture.
-
Protecting users against CSRF in My Account
Recently at Auto Trader, we’ve been busy overhauling the stack behind the ‘My Auto Trader’ area. In our previous stack, we’d implemented our own solution to protect our customers from CSRF (Cross-Site Request Forgery).
As we migrated the back-end to Spring Boot (a Java-based, open-source framework used to create production-grade web applications), we were able to use Spring Security to prevent this lesser-known vulnerability. Here’s how we did this, integrating it into our new GraphQL powered React front-end.
-
Reliable tracking: Validating Snowplow events using Cypress & Snowplow Micro
At Auto Trader, we have migrated from Google Analytics to Snowplow for event tracking. Our current implementation didn’t focus on the quality or the trustworthiness of the events being tracked. This time around, we focused on those as a first-class concern. In this post, I’ll take you through how we implemented automated data quality checks using Snowplow Micro & Cypress.
-
Moving from buckets to vectors: How to use Machine Learning to quantify how similar vehicles are to each other
How similar is a Ford Focus to a VW Golf? Is a Focus more like a VW Polo…?
At Auto Trader the question of how similar two vehicles are to each other occurs frequently, whether it be in the context of recommendations, helping retailers understand who their competitors are, price valuations etc. In this post we will describe one of the ways we have to compare apples and oranges (or in this case coupes and hatchbacks!).
-
The Case For CSS-in-JS
CSS-in-JS is the practice of utilising the power of JavaScript to dynamically generate and better organise your application’s CSS. The concept has gained traction over the years due to the popularity of UI frameworks / libraries such as React, Angular and Vue. This post attempts to convince you that CSS-in-JS is an approach worth investigating in the struggle to keep your codebase’s CSS in check.
-
View from across the data lake: Developing the mileage indicator using our self-service Data Platform
The mileage indicator is a new feature on our product page which shows how the mileage of an advertised vehicle compares to the average mileage of similar vehicles that we’ve seen on Auto Trader. It uses machine learning (ML) to predict what the mileage of a vehicle should be given its age. The ML model is trained on data from the millions of recently seen vehicles that we record in our S3-backed data lake.
I’d like to share the story of how the mileage indicator was built, from prototype to production, and finally to fully automated retraining and continuous deployment of the model. It is a microcosm of how we work at Auto Trader: centring our business around our data and using its insights to drive product design.
-
Building a more accessible web form
The contact forms on our website are a critical way for consumers to make contact with our advertisers. We have quite a few of them, from general “contact us” forms to enquiry forms about a specific car. In this article, we will focus on the used car contact form. This form has a huge amount of interaction. It allows users of the website to send a message to the seller of the vehicle they are looking at, which is a critical part of the process. Unfortunately, from an accessibility point of view, it wasn’t working as well as it could for everyone. We set ourselves a goal to create the best, most accessible form we could.
-
Migrating to Kotlin—what to look out for
Since Google announced the support for Kotlin as a first-class language at I/O in 2017, here at Auto Trader we’ve been gradually moving our Android app codebase over to Kotlin from Java. There are some tools to help developers achieve this (such as the automatic Java to Kotlin conversion tool), but there are a number of pitfalls to be aware of when it comes to the converter and the interoperability between the languages. In this post we share some insight from our experience.
-
A Road Trip through Data Delivery
Thinking about the delivery of digital products, we typically picture customer-facing applications with a shiny user interface. Apps built by teams made up of product owners, designers, developers, testers, etc. But what about data and the engineering teams who provide the Data Platform and trusted metrics that enrich the customer-facing applications? The purpose of this blog is to share my experiences as a Delivery Lead working with data teams and why they require delivery focus.
-
Using Machine Learning to estimate how attractive adverts are to car buyers
Here at Auto Trader we are always striving to improve the experience for both our buyers and sellers. In this blog we describe how we have used machine learning to estimate how attractive different adverts are to car buyers, in order to improve how we select which adverts to display in our search results. We elaborate on how we followed the Discovery -> Production process discussed previously to deliver the project, and the lessons we learnt along the way!
-
Running Experiments At Auto Trader
For a number of years, we A/B tested new www.autotrader.co.uk features on a separate production environment. This allowed us to validate that new features had the desired effect, and didn’t negatively impact any of our key indicators with a reduced slice of our users. But, it required three deployments of our largest web application and was slow and clunky to set up. We have since introduced a new A/B test platform which allows us to be more agile and respond to change more quickly. We use the data collated from these tests to drive our next generation features for the site. Read on to find out how we used to test features and why we made this change.
-
How we secured our payment pages using a Content Security Policy
In the Private & Home Trader Advertising squad we take payments for advertisements through our in-journey payment pages. Previously our payments were handled by a re-direct to a third party payment provider. When we decided to bring the payment pages into our journey, using Stripe elements (Stripe components embedded within our pages), it was vital that we secured our payment pages to eliminate any risk from malicious scripts or injection attacks. Here is how we did it.
-
How we use Architectural Decision Records (ADRs) on Data Engineering
When I joined our data engineers over a year ago they had already adopted Architectural Decision Records (ADRs) to document architectural decisions made whilst building Auto Trader’s Data Platform. ADRs are listed in ThoughtWorks’ Technology Radar as a technique to adopt; our data engineers were the first team I’ve worked on to use them. They have allowed us to capture the context and consequences of the decisions we make; in a way that provides transparency and allows the whole team, and wider organisation, to contribute.
-
Cultural Learnings of Data Academy for Make Benefit Glorious Company of Auto Trader: or how we're promoting data literacy.
At Auto Trader, we are attempting to democratise our data.
We put together a team of data engineers, signed up to AWS, built a Data Platform, replaced all the components at least once because they weren’t up to scratch and, having done all that, finally realised we needed some people to use it.
Luckily, as we were progressing, we engaged with our potential users, who provided plenty of interesting problems for us to work on. Unfortunately, we will never have enough time to generate the code to answer all of the questions those developers, analysts, scientists, business and sales folk have, so we’ll need them to self-serve.
-
How buyers are affected by an AI-driven product: Price Indicator
Artificial intelligence (AI), is the ability of machines to learn from data. Often, this data is derived from human interactions. In this blog, we will explore how Auto Trader has used AI and Machine Learning (ML) to directly influence the buyer’s and seller’s experience.