Editor’s note: Dave Currie joined Lob’s Address Verification team as a remote contractor. Working as the team’s Machine Learning Engineer, he helped to improve the accuracy of the Address Verification product by developing microservices that utilize machine learning.
When I tell people that my work is focused on improving an address verification product, I sometimes receive confused looks. If you think about a friend’s address, you might picture something like “1600 Pennsylvania Avenue, Washington, DC 20500”. An address as simple as this should be easy for a system to understand and verify if it exists or not. In this case, you’re right. Standard addresses that you’ve seen countless times before are quite easy to verify, but not all addresses are so simple.
In this post, I’ll share how we use machine learning to continually improve our address verification product at Lob to ensure our customers’ mailpieces get delivered to as many recipients as possible.
Lob’s Address Verification product receives millions of addresses everyday. When working at this scale, we see addresses with a range of formats:
Although these addresses all have some complexities, they still follow common patterns. Building a rules-based parsing system could be complex and difficult to iterate on as more patterns are added to it. However, this is where machine learning can excel as it will detect these patterns as you add more training examples.
Given the benefits that machine learning can provide to this problem, we wanted to get a solution into production as soon as possible. The question became, how do we quickly train a model, especially when Lob has so much address data to choose from? The answer, active learning.
Active learning is a cyclical process of identifying the most useful training examples, labelling these, and retraining the model. We started with a list of 100,000 unique addresses (this large number makes it more likely that uncommon address formats will be included in the dataset), labelled 10 of these with their address labels (e.g., primary number, street name, zip code), trained the model using just these 10 examples, then predicted the parsings along with a confidence of these parsings on the remaining 999, 990 addresses.
After using just 10 training examples, it was easy to see that the model was beginning to understand patterns in the data. For example, primary numbers are often the numbers at the start of an address and states are often the two letters before the zip code. Choosing the next set of 10 addresses to label and add to the training data is easy, pick 10 that the model has low confidence in how to parse.
This iterative process of training and labelling continued until the model could provide a net benefit to our address verification product. At this point, we moved our machine learning parsing model into production and provided our customers with the added benefit of a more accurate service. Model development will continue to further increase its accuracy, primarily by adding more training examples and better standardization of the input address.
We can train a performant address parser with fewer training examples by standardizing the input address. By reducing the complexity of the task, the model requires fewer training examples to become proficient. Methods to standardize the input address can include:
A key feature of our address verification product is speed. Therefore, the library we chose to help build our address parser had to be up for the task. After comparing a few options, we chose spaCy. Given its state-of-the-art speed, named entity recognition feature, and documentation, spaCy is very suitable for this task.
Measuring the performance of an address parser (or any named entity recognition [NER] model) won’t use traditional metrics from regression or classification problems. We chose the Jaccard coefficient as it is well suited for the evaluation of NER models. The parser’s performance on each address label was measured, then these scores were aggregated into a weighted average to compare the overall performance between two versions of the model.
With a well-defined problem and plenty of data that is ready for labelling, a machine learning solution can be delivered in a matter of weeks. The quick feedback loop that active learning provides will help you to reach the desired performance much faster than labelling random examples. If possible, reduce the complexity for the model by standardizing the input data.