Real-Time Personalisation of Search Results with Auto Trader's Customer Data Platform

upgrades — Photo by Markus Quinten de Graaf on Unsplash

Here at Auto Trader, we aim to help customers find their perfect vehicle as quickly and easily as possible. But with over 400k vehicles advertised onsite at any one time, this can prove challenging. Whilst we provide a powerful search engine to help narrow things down, customers still need to specify a set of filters to get the most out of their search, posing the question: what if they’re not sure exactly what they want? What if they’re not sure how to find what they want using our filters? How can we help those customers?

Search filters aren’t the only way a customer can express a preference - their other activity on Auto Trader can do that as well. Each time a customer views a set of search results, they express a preference by choosing which adverts to click on. This presents us with an opportunity: we can look at this activity, model a customer’s preferences from it, and subsequently improve our search results to show more vehicles that match those preferences.

In order to not change the customer experience on Auto Trader too dramatically, we started by amending just one of our search result positions: the Featured listing position.

Featured Listings

The Featured listing position sits at the crown of search and contains the first advert a customer sees when browsing for a vehicle, thus making it highly valuable for retailers to utilise. If a retailer would like to advertise a vehicle in this position, they must pay for a weekly Pay Per Click (PPC) Campaign. This provides a click budget to include a selection of their vehicles in the advert pool used to populate the number one position. Customers will then see this listing and may click through to the advert spending clicks from the retailer’s budget. Once the retailer’s budget reaches zero, their vehicles automatically stop being advertised in the Featured listing position until they purchase more clicks for the following week. The product encourages a higher click-through rate for the retailer’s adverts, in turn leading to more sales.

When a customer is navigating the search listing page and they apply any of the available filters, the Featured listing position will find a vehicle from the PPC pool to match the explicit preferences the customer has requested. However, if no filters are applied, a vehicle gets selected from the pool at random and presented to the customer. If we know the customer’s preferences, we can use this information to personalise the listing and increase the click-through rate for this position by showing vehicles that our customers are more likely to want. To help us infer those customer preferences, we built a Customer Data Platform (CDP).

An example Featured listing

An example Featured listing. Ideally, the vehicles here should be personalised based on the customer’s preferences. The more expensive advert shown here might not be in the price range the customer has previously shown an interest in.

Building a Real-Time Customer Data Platform

The CDP at Auto Trader is a purpose-built real-time database that ingests tracking events from our behavioural data platform Snowplow. Every time a customer interacts with the Auto Trader website or our native apps, such as viewing an advert for a car, visiting the homepage, searching for vehicles and sending a request to a retailer, we can track these by firing an event to Snowplow.

Events contain identity information such as a unique user ID if the customer has consented to tracking and the Auto Trader account ID if the customer has logged in. The event also contains information regarding the action taken. For example, if you view a full-page advert on the website, the event will contain information about that advert such as the vehicle make, model, age, mileage, fuel type and more.

{
    "uniqueId": "fedc3d3b-e267-4536-bfb4-a8f0d6f4df15",
    "atUserId": "a875d622-7bef-405c-b360-72c744b6bf0d",
    "loggedIn": true,
    "url": "www.autotrader.co.uk/advert/{advertId}",
    "advertDetails": {
        "advertId": "{advertId}",
        "make": "ford",
        "model": "focus",
        "ageYears": 4,
        "mileage": 40000,
        "fuelType": "electric"
    }
}

A mockup of a Snowplow event. In reality, the event data schema is more complex than this.

Building a Real-Time Customer Database

These Snowplow events stream into a GCP Pub/Sub topic. From there, they are relayed into our Kafka Cluster to be consumed by the CDP. The number of events that make it to the CDP is large (~1,000 / second) and keeping up with all this data has required significant engineering that we would like to delve deeper into in another blog post. For now, the high-level architecture of the CDP can be seen in the figure below.

CDP architecture

High-level architecture of the CDP consumer and data service. Events stream in from GCP Pub/Sub into a service that then relays the events into our Kafka Cluster. The event consumer application then uses these events to update the customer’s unified profile, which is stored in Bigtable. The data service then serves these profiles into downstream services via our gateway.

Building a Customer’s Profile

The event consumer service is a Kafka consumer written using Java (Springboot). It deserialises the event from the user behaviour topic and then enters a processing pipeline with three stages:

The identify user stage uses the identifiers attached to the event and then determines if we have seen this customer before; if so, return their customer profile ID, if not, create a new profile.
The append event stage adds the event onto the customer’s timeline of recent events.
The update profile stage takes the last 10 days of events and calculates attributes and segments that customer belongs to.

Figure showing how the event timeline is summarised to create derived attributes. These attributes are then fed into rules that decide whether the customer is in the segment

Segmenting Customers

When a customer is in a segment, we want to use that segment to narrow down their search results to ones matching their preference. So ideally we want to define segments that align with our search filters. There’s no value in applying a filter a customer has already applied, so we also want segments that correspond to less commonly applied search filters.

Based on these criteria, we assign users into segments for fuel type preference (ELECTRIC, PETROL, DIESEL etc.) and body type preference (SUV, HATCHBACK, CONVERTIBLE, etc.).

To decide when to add a customer to a segment, we look at their recent browsing activity. Taking the ELECTRIC segment as an example, we want customers in that segment to be more likely to view an EV (electric vehicle) than the average customer. Therefore we assign customers to the segment based on how likely we think they are to view an EV.

We could use any aspect of a customer’s browsing history to decide whether they’re likely to view an EV, such as whether they’ve read content about EVs, or whether they’ve entered our EV giveaway competition. However, the most predictive feature is simply whether they’ve viewed EVs in the past. So we use a simple model based on the proportion of vehicles already viewed that had an electric fuel type.

Once we know how likely someone is to view an EV, we need a threshold to decide when to add them to the segment. This is a classic precision/recall trade-off. If we use a lower threshold, more customers will end up in the segment (leading to a higher recall), but the proportion of vehicles viewed that match the segment will drop (leading to a lower precision). To keep things simple when narrowing down search results, we choose a threshold of 50%, since that means a customer can be in at most one fuel type segment.

2 Petrol views, 3 Electric views -> ELECTRIC segment applied
3 Petrol views, 3 Electric views -> No segment applied

We apply the same approach to the other fuel type segments, and the body type segments as well.

Linking Segments Into Search

Now that customer preferences regarding fuel type are being logged in the CDP, transformed into segments, and stored within a customer’s profile against their unique user ID, we need to fetch this information whilst they browse the website in order to manipulate their search experience. Before a customer lands on Auto Trader, their request passes through Consumer Gateway, a Springboot Zuul application used to manage customer traffic. Within this application, we can create a custom Zuul filter to manage the request to fetch the customer’s profile. This includes only fetching it when a customer is on certain parts of the website - more specifically, whenever they navigate to a search listings page, as this places less load on the CDP. Once the profile gets fetched from the CDP, using the unique user ID from the Snowplow cookie, the segment enums are concatenated and inserted into the request header as a string. This header is now ready to be forwarded to the family of applications that powers the Auto Trader website, Sauron.

Sauron queries a myriad of domain web services to populate what a customer sees and interacts with on the Auto Trader website, including click-and-collect locations, finance options, part exchange valuations and more. We’re particularly interested in the service that provides customer search results for vehicles, Vehicle Listing Service (VLS). VLS provides listings for Natural positions (positions that are not influenced by any Auto Trader products) and Prominence listing positions – including Featured listing. This data is fetched by querying Search One, another internal service that is responsible for providing advert information. VLS does this by building a query based on the filters used by the customer and the listing type. For instance, if it’s for the Featured listing position, the query requests adverts from the PPC pool. We can pass segments into VLS again using the request headers and, based on the segments present, VLS can add to the Search One request query. As an example, if the customer is in the SUV segment, we can manipulate the query so that it brings back an advert with an SUV body type as we can infer that the customer is interested in SUVs. The segment is only applied if the body type filter hasn’t been used (since we only want to use segments if we don’t explicitly know a customer’s preference).

Search Segmentation architecture — This diagram shows the full integration of segments into search listings. The green path shows how customer data is collected and ingested into the CDP whilst they browse the website. The red path shows how the customer’s profile containing their segments is fetched before they start browsing and how this is then used to personalise the search listings page.

AB Testing

Tailoring a customer’s journey on our website should be a positive experience, so to validate if customers are happy with the customisation of the Featured listing position, we can analyse their engagement through AB testing.

An AB test is a randomised experiment where we assign customers to either a test or control bucket as they enter our website. Customers are typically shown different versions of the webpage depending on the bucket they are assigned to. In the case of testing CDP segments, a customer in the test bucket will be shown an advert in their search results that we believe fits their preferences based on our modelled data in the CDP. To illustrate how this works, let’s say a customer in the test bucket is in the ELECTRIC segment. We would show them an electric vehicle advert in the Featured listing position. A customer in the control bucket, however, would be shown an advert in the Featured listing position with no customisation of their CDP body type or fuel type segment preference.

For a CDP segment AB Test, we have four groups of customers we can observe and compare:

Control customers in a segment
Control customers not in a segment
Test customers in a segment
Test customers not in a segment

This scenario differs slightly from conventional AB tests in that we must be careful of sample sizes. When we choose the percentage of traffic to run the experiment on, we must consider the size of the segments. For instance, if we create a 50% test bucket but only 20% of Auto Trader customers have a fuel type preference, we would only be testing the change on 10% of all Auto Trader customers. This means the test may take longer to run than a conventional AB test.

Prior to the AB test, we calculate power and run-time based on customers who have a preference segment only. Naturally, if a customer is randomly assigned to the test bucket but they do not have a preference, their journey is exactly the same as a control customer. Hence, we remove non-segment customers from our analysis because they may have different behaviour from our in-segment customers which in turn can affect the results of the AB test. Further to that, as a rule, we would only customise the Featured listing position if the customer did not already apply a search filter for the CDP segment. Going back to our ELECTRIC preference example, if the customer’s search was filtered on petrol vehicles, we would not overwrite this filter to show electric cars as it could negatively impact their experience on site.

The Result

We have recently run AB tests using CDP segments for body type and fuel type independently using the methodology described above. In both instances, we have seen a positive uplift in the mean number of advert views per session in our test bucket. For body type, we saw an average uplift between +4.8% and +7.2%, and for fuel type an average uplift between +2.1% and +4.1%. These figures are at a 90% confidence interval, meaning if we repeated the test multiple times we would see the mean advert views per session appear between the specified interval 90% of the time.

Monitoring

Now that we’re using customer segments to modify the behaviour of search, if something breaks we can end up negatively impacting the customer experience, so we monitor the segment performance. We do this by looking at the vehicles viewed by customers and comparing them against the segments we had assigned to those customers. For each segment, we can then calculate the proportion of customers in the segment. Along with this, we calculate the precision (the proportion of vehicles viewed by a customer that match the segment) and the recall (the proportion of vehicles matching a segment that were viewed by a customer in that segment) of the segment.

The plot below shows this monitoring for our fuel type segments. Most customers are either in the petrol or diesel segments, and around 1.5% of customers are in the ELECTRIC segment. When a customer is in the ELECTRIC segment, roughly 70% of the vehicles they view are electric (the segment precision) and almost 50% of all electric vehicle views come from members of the segment (the segment recall).

Percentage of customers in each fuel type segment — The percentage of customers in each fuel type segment. Segment percentages are approximately 48% petrol, 35% diesel, 1.5% electric, 1.4% petrol hybrid, and 0.7% petrol plug-in hybrid.

Precision and recall of fuel type segments — Precision and recall of the fuel type segments.

Next Steps

In this blog post, we’ve demonstrated how we can use our Customer Data Platform to improve the search experience on Auto Trader. So far, we’ve only looked at segments for body type and fuel type, and we’ve only personalised one of the search positions. There’s opportunity to go further.

Next, we’re planning to explore segments linked to other vehicle attributes. Price, mileage, and vehicle age are all promising candidates. We’ll also explore how the segments interact with each other, and how best to personalise the results when a customer is in multiple segments. Should we give equal weight to all segments, or are some more valuable than others? We’re also in the process of personalising more of our search results, not just the Featured listing, all with the aim of giving customers an even more relevant search experience.

Enjoyed that? Read some other posts.