Build a scalable, online recommender with Keras, Docker, GCP, and GKE

Published in

Insight

10 min readMar 25, 2020

TL;DR

Online retailers such as Wayfair and Amazon offer endless arrays of products, chosen by smart AI product recommender systems. In the past, these systems have been out of reach for individual developers. However, with new open-source technology, it is now easier than ever to build scalable, AI-powered content-based product recommendations, with highly personalized suggestions.

In this blog, I will share how I built Pair, a scalable web application that takes in a product image, analyzes its design features using convolutional neural network, and recommends products in other categories with similar style elements. After reading this, I hope you can learn how to build deep learning models using TensorFlow Keras, productionalize the model as a Streamlit app, and deploy it as a Docker container on Google Cloud Platform (GCP) using Google Kubernetes Engines (GKE).

Pair: An image-based product collection recommender

When I first moved into my new apartment, I was super excited to furnish the place into my dream home. While I did not have a specific picture in mind, I knew I wanted to find furniture that complemented each other stylistically. Unfortunately, it only took a few Google searches to feel overwhelmed, with more furniture choices than I cared to look through. With such a wide variety of styles, textures, and colors available, how could I possibly make my dream home a reality amidst this jungle of digital madness?

This pain point led me to wonder if AI-powered content recommendation could help customers find the products they truly want in a sea of irrelevant choices. We now live in a world filled with irrelevant information. This noise can saturate our attention and undermine our ability to find what we truly want. A smart, personalized recommender system that learns our individual preferences can filter through the noise and help us find the most desired products in a timely manner. Such tools can save both customers and retailers time and hassle in discovering the best products to buy and sell.

With that idea in mind, I launched Pair. Pair is a scalable web application that takes in a product image from a user, analyzes its design features using a convolutional neural network, and recommends products with similar styles in other product categories.

Most recommender systems in use today leverage classical machine learning models. They can be divided into collaborative filtering approaches, which perform matrix factorization on user-item interaction matrix, and content-based approaches, which use regression or classification models on prior information about the users and/or the items to make recommendations.

Both approaches analyze structured tabular data from the users or items. In this project, I was curious to see if deep learning approaches — specifically convolutional neural networks (CNN) — can learn useful latent features from unstructured image data and use them to make actionable recommendations. In other words, I wanted to find out if CNNs can extract intangible design features that explain why we prefer certain furniture over others. Read on as I dive into Pair’s inference architecture for making recommendations.

Pair inference architecture

Pair takes in an image library (I used the IKEA catalog image dataset) of one furniture type as input into a pre-trained CNN (in this project a pre-trained VGG16 was used, but other pre-trained models can be used as well). The different convolutional layers in the CNN act as feature extractors to generate feature maps for each furniture image. You can think of this step as the CNN creating custom “design filters” to encode the furniture designs.

The feature maps are then used to compute Gram matrices, which measure correlations between feature maps to highlight the most salient features that best represent the furniture. Mathematically, this is done by transforming the 3D matrix of feature maps into a 2D matrix, then computing its correlation matrix. The highly correlated feature maps will become strong latent representations of the furniture image.

Gram matrices are high dimensional representations of furniture design. To reduce them for the ease of similarity search, the matrices undergo dimensionality reduction using principal component analysis (PCA). The principal component vectors are saved as searchable indexes using Facebook AI Research’s faiss library for search speed and scalability (read more about faiss here and here). The saved index library can later on be loaded to query incoming furniture images. Multiple such index libraries are generated for different furniture categories.

When the user provides a new furniture image, it is fed through the aforementioned pipeline to generate a design feature vector that encodes the design features of the furniture. This design feature vector then queries the feature index libraries using L2 norm as the similarity search metrics (cosine similarity is also available). Furniture with the closest Euclidean distance are returned as the recommended furniture.

Product recommendations with image similarity search to query processed user image with feature libraries

In case you are curious, below you can see some sample feature maps from each block of the VGG16.

Sample feature maps from each convolutional block of the VGG16 network

Pair app deployment architecture

To make Pair an interactive experience for users online, I created a data application using the Streamlit framework. It only takes a few lines of Python code to get started, is very intuitive to use, and it allowed me to quickly prototype a beautiful and interactive machine learning application within hours. Next, to make the application widely available online to multiple users, I hosted my Streamlit app as a Docker container on Google Cloud Platforms (GCP) and deployed it to a Kubernetes cluster via Google Kubernetes Engine (GKE). To learn more about Docker and how to build and run Docker images, take a look at this quickstart guide.

High level view of the Pair web app deployment architecture

The ability to handle multiple server requests was accomplished by creating a LoadBalancer Service on GKE, which groups a set of Pod endpoints — each holding one or more containers running instances of the app — into a single resource cluster. The load balancer distributes server workloads across multiple computing resources so the app can handle multiple call requests without timing out. In addition, GKE provides a stable IP address that external users can use to access the app. When the cluster receives a request, the load balancer routes it to one of the Pods in the Service, which then returns an instance of the Streamlit app. Pods run on nodes in the cluster. Once created, a Pod remains on its node until its process is complete, the Pod is deleted, the Pod is evicted from the node due to lack of resources, or the node fails.

Schematic of Pair app deployment using GKE and load balancer

The deployment process from a deep learning model to an accessible web application took less than 10 steps. This short GKE tutorial provides a nice step-by-step guide on how to deploy a containerized web application (provided that the Google Cloud SDK is installed). I highly recommend going through the tutorial once to understand how deployment works. To summarize, the minimum steps required to launch the app are:

Package the Streamlit app into a Docker container
Test the container on a local machine to make sure it works (optional)
Upload the container image onto Google Container Registry
Create a container cluster (pool of VMs) to run the container image
Deploy app replicas and schedule them to run on nodes in the cluster
Expose the app to Internet traffic by creating an external IP and a Load Balancer
Scale up the application by adding additional pods, or delete the Service and container cluster to avoid incurring unwanted charge

One minor caveat I encountered when deploying the app was the need to specify the correct targetPort in the Streamlit application container so it listens to the right channel from the load balancer. Let me explain more what I mean. Typically external clients call the Service by using the load balancer’s IP address and the Transmission Control Protocol (TCP) port specified by port. The request is forwarded to one of the member Pods on the TCP port specified by targetPort. For example, if the client calls the Service at 203.0.113.100 on TCP port 80. The request is forwarded to one of the member Pods on TCP port 8080. The member Pod must have a container listening on TCP port 8080. Since Streamlit’s default listening port is 8501, if targetPort is set as 8080, then the app is listening on the wrong port, so any server request will return the error “This site can’t be reached — this site cannot be reached this website refused to connect.” Only after the correct targetPort is set can the IP address be successfully redirected to the Streamlit app.

Evaluations and future directions

Checking Pair recommendation performance

Ideally, I would perform rigorous A/B testing on Pair to evaluate whether or not it is recommending the right products for users. Unfortunately, I did not have the luxury of time or manpower to do so for Pair, so I had to think of a clever alternative. Luckily, the IKEA image dataset also contained item-to-room image and text information that grouped different furniture together in various room scenes.

Let’s assume the different furniture in those room scenes are carefully selected by IKEA interior designers, so by some Swedish aesthetics the furniture fit well stylistically. These furniture groupings can be used as a benchmark to quantify how similar Pair’s recommendations are to IKEA interior designs. More specifically, I used a metric called hit ratio at n (HR@n). A “hit” is recorded if one of Pair’s top-n recommendations matches IKEA interior designs. HR@n can be defined as the number of recommendations with a correct top-n match divided by the total number of recommendations made. Hit ratio close to 1 means Pair recommendations agree well with those of IKEA interior designs, and 0 means poor agreement. For instance, if HR@5 score is 0.5, that means in 50% of all furniture queries, at least one choice in Pair’s top-5 recommendations correctly matches one of the IKEA interior designs.

For evaluation, I queried different chairs and asked Pair for table recommendations. I quantified HR@5, HR@10, and HR@20 as 0.35, 0.5 and 0.6, respectively. The results are not great, not terrible, but remember, Pair uses a pre-trained VGG16 network that is originally trained for general object detection tasks. The network has not been trained specifically on this IKEA dataset before for making product recommendations, so it actually doesn’t know what furniture matches stylistically. Despite Pair’s lack of “design sense”, it is nevertheless interesting to see that, out of the ~134 possible table choices, one of Pair’s top-5 choices actually matches IKEA benchmarks in 35% of all queries. I would say that is a pretty good starting point!

Below is a sample Pair recommendation with HR@5:

HR@5 example: pairing of different styles/textures

And here is one with HR@10:

HR@10 example: pairing of similar texture

Proposed transfer learning architecture

To further improve the quality of Pair’s recommendations, I have also set up a transfer learning pipeline to learn design similarities based on IKEA interior designs. To do this, I modified VGG16 to perform a multi-label classification task that tries to classify which IKEA room scenes each furniture belongs in (using item-to-room information). This is done by replacing the last three dense layers of VGG16 with new dense layers and retraining the network using the IKEA dataset. I think it will be interesting to see if Pair can make recommendations that are stylistically similar to IKEA interior designers! The transfer learning pipeline is Dockerized and is built to train on GCP instances. For the source code see the Pair Github repo.

Proposed transfer learning to generate more refined feature maps that can detect when different furniture fit together stylistically

Recap

For my Insight Data Science project, I created Pair, an image-based product collection recommender that uses a CNN to extract design features from product images and queries them against feature libraries to make recommendations across multiple product categories based on feature similarity. Pair is built as a Dockerized Streamlit container image and is hosted in the cloud via GCP. The application is deployed onto the internet using GKE’s LoadBalancer Service. You can try Pair with the live demo to see for yourself. Lastly, I evaluated Pair using IKEA room scene images as benchmarks, and used hit ratio at n as metric for quantifying how similar Pair’s recommendations are to IKEA interior designs. As further improvement for Pair, I set up a transfer learning pipeline to generate more refined feature maps to detect furniture that can fit together stylistically.

Acknowledgements

Pair is largely based off of Austin McKay’s StyleStack Github repo. The IKEA catalog image dataset is forked from Ivona Tautkute’s Github repo. Lastly, thanks to Holly Szafarek and Matt Rubashkin for their input in writing this article.

Are you interested in transitioning to a career in tech? Sign up to learn more about Insight Fellows programs and start your application today.