Auto Trader Engineering Blog
-
Navigating Turbulence with our Landing Page Performance Model
Photo by Pascal Meier on Unsplash.
At Auto Trader, we define landing pages as the first page a consumer sees when they enter our website. As you’d probably expect, the Auto Trader homepage is one of our most common landing pages, as that is usually where a consumer would start when beginning their vehicle-buying journey. In this blog post, I’ll be talking about how we’ve developed our understanding of landing pages through the production of our new Landing Page Performance model, and why it is the next step towards self-serve data at Auto Trader.
-
Impressions of the 2023 Data Council conference and Austin, Texas
Photo by Tico Mendoza
Back in April, Auto Trader gave a few of us the opportunity to attend Data Council 2023 in Austin, Texas, USA. Data Council is an independently curated conference that covers many aspects of working in a modern data-focused role, from infrastructure and data engineering to analytics tools, data science, machine learning and AI.
We were really excited about this opportunity because we all used to be part of our Data Engineering team at Auto Trader. The team has since dispersed throughout our Platform Engineering tribe, so we now have people with specialist data engineering skills embedded within data product teams. We had a great time, not just at the conference but in Austin itself. Now that the dates for Data Council 2024 have been announced, we thought we’d write a Q&A-style blog post to share our experiences.
-
Faith Driven Development
We’ve all done it. There’s a little bug somewhere in an app, an unexpected null or the like, and we throw together a quick commit to fix it. Then we push it straight to master, confidence overflowing, telling ourselves that there’s no way this doesn’t solve the problem. We’ll triumphantly declare something to a colleague to the effect of “I’ve fixed that little bug by the way – was just a one-liner”.
-
Life Cycle of Our Package Uplift Model
Photo by Nik Shuliahin on Unsplash.
You’ve done the hard work in researching, developing and finally deploying your shiny new Machine Learning (ML) model, but the work is not over yet. In fact it has only just started! This is the second part of our series on our Package Uplift model, which predicts how well our customer’s stock will perform on each of our advertising packages. We will discuss how we monitor and continuously develop our machine learning models after they have been designed and deployed to reflect the current market. See part one of this series for how we created Package Uplift.
-
Demonstrating the Value of our Packages
Photo by Stephen Dawson on Unsplash.
Advertising packages are the core product at Auto Trader. Depending on the package tier our customers purchase, they get to appear in our promoted slots or get an advantage in our search rankings. As a business, we need to know how well our products are performing for our customers, as underperformance could lead to unhappy customers and them canceling their contracts with us. Knowing the impact of a package, not just overall, but on a per customer basis allows us to make sure our offerings are fair, and quickly react to any changes we observe. In order to isolate the effect of package level and to accurately predict a customer’s performance on each product, we had to create our most complex production model to date, Package Uplift. In this blog post, we’ll cover how Package Uplift works and how it builds on our ecosystem of Machine Learning models.
-
Things I learned when building an API gateway with reactive Spring
Coming from a non-reactive and more traditional Java Spring API background, I had limited experience working with Mono and Flux. To build our Spring Cloud Gateway application, I needed to bring myself up to speed on these concepts. I hope this blog post may help others who wish to learn more about how we can use reactive streams with Spring.
-
Real-Time Personalisation of Search Results with Auto Trader's Customer Data Platform
Photo by Markus Quinten de Graaf on Unsplash
Here at Auto Trader, we aim to help customers find their perfect vehicle as quickly and easily as possible. But with over 400k vehicles advertised onsite at any one time, this can prove challenging. Whilst we provide a powerful search engine to help narrow things down, customers still need to specify a set of filters to get the most out of their search, posing the question: what if they’re not sure exactly what they want? What if they’re not sure how to find what they want using our filters? How can we help those customers?
Search filters aren’t the only way a customer can express a preference - their other activity on Auto Trader can do that as well. Each time a customer views a set of search results, they express a preference by choosing which adverts to click on. This presents us with an opportunity: we can look at this activity, model a customer’s preferences from it, and subsequently improve our search results to show more vehicles that match those preferences.
-
Scoring Adverts Quickly but Fairly
At Auto Trader, we have multiple machine learning models to predict and score properties of the advertised vehicles on our platform, from valuations to desirability. With circa 8,000 new adverts listed each day, how do we generate scores for all the new adverts that have limited observations? How do we ensure that these are fair estimates of their long-term value, and aren’t going to erratically vary day to day? In this post, we’ll cover how we have addressed these problems in the case of one of our core models, Advert Attractiveness, which scores adverts based on their quality.
-
Non-disruptive in-place K8s cluster upgrades at Auto Trader
Photo by Markus Spiske on Unsplash
At Auto Trader, we keep our Kubernetes clusters up-to-date to the latest or T-1 version available in the Google Kubernetes Engine ‘stable’ release channel. We run large clusters (450+ workloads, 2k+ containers) and perform these upgrades within normal office hours without negatively impacting our Software Engineers or compromising our service availability. This post is about how we do that.
-
Dry running our data warehouse using BigQuery and dbt
In this post, we talk about how we use dbt & BigQuery dry run jobs to validate our 1000+ models in under 30 seconds.
As mentioned in a previous post, at Auto Trader, we use dbt (data build tool) to transform data in BigQuery, our cloud data warehouse. Ensuring the data in our platform is accurate and available on time is important for both our internal and external data consumers. When we build a dbt project it compiles our models—templated
select
statements which define the transformations to our underlying source data—into SQL that is executed against the data warehouse.While dbt is an excellent tool for creating these complex transformation pipelines, it does not check that the
select
statements are valid SQL. The current solution for this is a CI environment that executes your project and runs data tests to check the transformations are working as expected. The main drawbacks of this are speed and cost as the database engine you are using needs to execute SQL in a production-like environment. -
Experimentation at Auto Trader
Experimentation—or ‘AB testing’—is a vital part of adapting to the ever-shifting behaviour of users on the internet. In this brief (albeit hopefully informative!) blog post, I will be covering how we’ve historically approached this topic at Auto Trader. I’ll then go on to describe the changes we’re putting in place to make it an even more effective tool in our kit as we move into the landscape of digital retailing. Finally, I’ll go through how we envisage the process to look in the future.
-
How Auto Trader ensures end-to-end data trust at scale
At Auto Trader, we make use of several tools to store and maintain our data. The majority of our analytical data is stored in BigQuery, a serverless cloud-hosted data warehouse. From here, we use dbt to transform and model that data.
Inevitably, things can go wrong with the tasks involved in building the datasets that our stakeholders rely on. We’re always looking for ways to automate and better operationalize our data workflows. We want to be confident in the quality of the data that we’re sharing, which is why it’s important to know when a dataset doesn’t look quite as we expect.
An important feature of our data pipelines is observability, as the complexity of data workflows can otherwise make it hard for us to investigate quality issues. To help with this, we’ve adopted Monte Carlo as a monitoring tool, which provides a rich UI to set up notifications and custom domain-specific checks on our data. In this post, we’ll explain how we have automated the creation of Monte Carlo notifications and embedded them into our existing data development process.
-
Auto-generating an Airflow DAG using the dbt manifest
This post explores our continuing effort to improve our developer experience and ability to respond to incidents. Here, we discuss how we made scheduling dbt tasks simpler and more transparent, removing the need for the dbt user to consider scheduling when deploying a new model.
-
AT Design Talks
Back in January, our design team hosted our first AT Design Talks event, which featured a wide range of lightning talks by members of our design and research team. We covered a variety of topics, from the process of building out data visualisations for our customers, to how we use eye-tracking to understand the impact of information architecture on consumer scanning patterns.
-
Single Sign-On and Basic Auth with Spring Security
The requirements for some of our apps have led to interesting explorations of the Spring Security configuration. I hope to show you by way of example how non-standard implementations can still be achieved with elegance once you understand the Spring Security architecture.
-
Protecting users against CSRF in My Account
Recently at Auto Trader, we’ve been busy overhauling the stack behind the ‘My Auto Trader’ area. In our previous stack, we’d implemented our own solution to protect our customers from CSRF (Cross-Site Request Forgery).
As we migrated the back-end to Spring Boot (a Java-based, open-source framework used to create production-grade web applications), we were able to use Spring Security to prevent this lesser-known vulnerability. Here’s how we did this, integrating it into our new GraphQL powered React front-end.
-
Reliable tracking: Validating Snowplow events using Cypress & Snowplow Micro
At Auto Trader, we have migrated from Google Analytics to Snowplow for event tracking. Our current implementation didn’t focus on the quality or the trustworthiness of the events being tracked. This time around, we focused on those as a first-class concern. In this post, I’ll take you through how we implemented automated data quality checks using Snowplow Micro & Cypress.
-
Moving from buckets to vectors: How to use Machine Learning to quantify how similar vehicles are to each other
How similar is a Ford Focus to a VW Golf? Is a Focus more like a VW Polo…?
At Auto Trader the question of how similar two vehicles are to each other occurs frequently, whether it be in the context of recommendations, helping retailers understand who their competitors are, price valuations etc. In this post we will describe one of the ways we have to compare apples and oranges (or in this case coupes and hatchbacks!).
-
The Case For CSS-in-JS
CSS-in-JS is the practice of utilising the power of JavaScript to dynamically generate and better organise your application’s CSS. The concept has gained traction over the years due to the popularity of UI frameworks / libraries such as React, Angular and Vue. This post attempts to convince you that CSS-in-JS is an approach worth investigating in the struggle to keep your codebase’s CSS in check.
-
View from across the data lake: Developing the mileage indicator using our self-service Data Platform
The mileage indicator is a new feature on our product page which shows how the mileage of an advertised vehicle compares to the average mileage of similar vehicles that we’ve seen on Auto Trader. It uses machine learning (ML) to predict what the mileage of a vehicle should be given its age. The ML model is trained on data from the millions of recently seen vehicles that we record in our S3-backed data lake.
I’d like to share the story of how the mileage indicator was built, from prototype to production, and finally to fully automated retraining and continuous deployment of the model. It is a microcosm of how we work at Auto Trader: centring our business around our data and using its insights to drive product design.