How we treat inbound emails right with Amazon Web Service

Tue 25 September 2018 By Bertrand Cournaud • Tags: Amazon JAVA

An email invasion

At PeopleDoc, we are facing an amazing growth in our application's usage. More and more companies use dematerialized documents and they want to integrate them as easily as possible into our application.

That's why we provide them with a feature where they can add documents into PeopleDoc through emails.

But as we grow, we receive more and more emails to analyze, parse and store. Our previous workflow was good until some point that we recently reached. We needed a more reliable, fault resilient and scalable solution.

This article is about how we did it from a functional point of view. Another article with technical details might see the light in the future.

Our current architecture

At People Doc, we try to divide our products into microservices. We have a microservice which handle the storage of documents, another for converting docx files to PDF, one for sending emails and so on.

We also have a micro service which is responsible for handling incoming emails, which is the feature we will be discussing about in this article. The advantage of having a microservice dedicated to this feature is that we can change it without impacting all other applications (obviously).

The user story is as follow:

As a user of PeopleDoc,

I want to easily send documents on PeopleDoc platform

So that using PeopleDoc day to day is a delight ;D

And here is an acceptance test that goes with it:

Given an email address associated with a PeopleDoc user's account

When I send an email with an attachment to this address

Then I can see the email in the Inbox of the user with the attachment.

One of the use case is to be able to scan document and send them directly on PeopleDoc (as many enterprise scanners have a functionality to send scanned documents through email). Once they are sent and received by our service, they appear in PeopleDoc interface as a new inbox item.

Specifications

We had a few specifications before starting this project:

We don't want to handle our own SMTP server. It's painful and quite complicated to set up and maintain.
The application must be fault resilient. As a SaaS company, we might face downtimes on some of our applications. Nevertheless, we must be resilient and impact as little as possible our clients.
The application must be scalable. As our client's number grow, we need to build an application which will handle more and more incoming emails every day.
Data must be handled in Europe. Our product works all around the world. However, laws in Europe are different from other continents. Our clients want all their data to be handled and stored in EU and that's what is written in our contracts.

The Workflow

We had identified that Amazon could be of a great help with one of its services: Amazon Simple Email Service (SES). They offer to handle all the SMTP server and allow us to write custom rules for incoming emails. All we have to do is to configure our DNS to redirect all incoming emails arriving on our domain name to the Amazon SMTP server located in Dublin.

Once our emails are received, we encrypt them and store them in Amazon Simple Storage Service (S3). Here again, Amazon allow us to write custom rules when an object is stored in S3. So what we did was to send an event on a queue, containing the ID of the email.

This queue is handled by another Amazon service; Simple Queue Service (SQS). This is a queue on which an application can register to be notified when a new message is posted on it. So we plugged our application to this queue and we are now notified each time a new incoming email is received!

When our application is notified, it gets the ID of the email from the SQS message, downloads the email from S3, decrypts it, retrieves its attachments, upload them on our secure storage (which is called SAE, for Système d'Archivage Électronique) and create a new message in the user's inbox in People Doc.

Here is what it looks like on a diagram:

Completely scalable

Building this application gives us quite some advantages.

First of all, today, we have a stateless application, and that means a lot! Retrieving and parsing emails doesn't require to store any data. So we can have multiple instances of our application doing the work at the same time, distributing the load of these tasks between each application. Whether these instances are on the same server or not doesn't change a thing!

When an instance reads an event from SQS, this event becomes invisible for other instances. It processes the event, get the emails, parse it and, eventually, if all went well, it acknowledges the event so that it is removed from the queue.

Fault resilient

However, if the event has not been acknowledge for a certain amount of time since it has been retrieved by an instance, it automatically reappears in the queue. This scenario is quite useful when an instance starts processing an event and then crashes in the middle of the process (network failure, server issue…). In this case, the event will automatically reaper and another instance will process it.

Downtimes

A good point also is that if our application is down for some reason, nothing is lost. We can afford downtime and that's quite a relief!

When the application is down, all emails are still stored on S3 and events are piling up in SQS. Once the application is back online, it simply retrieves the events one by one and starts processing the email again.

SQS events can stay in the queue for a maximum of 14 days. Hopefully, our application will never be down so long…

Encryption

When emails are received by SES, they are encrypted by SES before being stored on S3. When we retrieve them with our application, they are still encrypted. We are using a client-side encryption so they are never unencrypted when they go through the network.

What's next?

We encountered a few pain points while developing this solution (like setting up the encryption policy or getting information about what emails look like when they are stored) but, despite that, setting up this whole process was quite straightforward. Amazon provides open source SDK that you can find on Github and creating new services on their administration console is intuitive.

However, everything is not done yet. We need for instance a way to monitor the number of emails going through this process and more precisely the number of emails which are on error. For those on error, we need to be able to replay them again, in case the error was due to an external factor at the time the email was first processed.

So even if the main functionality is done, there is still room for improvements!

Feature flags in a Single Page Application: a cookbook

Thu 05 July 2018 By Xavier Cambar • Tags: javascript

The various ways to use feature flags to improve your product

SRE Organization, the Zumba at Guernsey

Wed 18 April 2018 By Florian Grignon • Tags: organization SRE

Context

PeopleDoc is a fast growing company. We now have an innovation lab, and a good amount of feature teams. The number of services running in production is growing, and the dependencies between each service/project at the same time.

The mission of the SRE team is to make sure …

Using Github CODEOWNERS file

Thu 12 April 2018 By Ronan Dunklau • Tags: github

Introduction

GitHub CODEOWNERS file is a simple way to automate away some of the pain associated with the review system on github, by automatically assigning reviewers to a pull request based on which files were modified.

How to use

It's really simple ! Just drop a file named CODEOWNERS either at …

Progressive Web Applications (PWA): A primer

Wed 11 April 2018 By Édouard Lopez • Tags: javascript

What is a Progressive Web Application, why is it a step forward fo the Web?

We attended PgDayPAris 2018!

Mon 19 March 2018 By Flavie Perette • Tags: postgresql

What we've seen and heard at the latest session of PgDayParis conference

Welcome to the Virtual Open-Space!

Tue 06 March 2018 By Bruno Bord • Tags: remote

How to cope with remote worker isolation using a virtual open-space.

TimescaleDB

Sun 06 August 2017 By Marc Cousin • Tags: postgresql

This article is about TimescaleDB, a postgreSQL extension, specialized in storing Time Series data. The purpose is being able to easily manage data coming almost sorted on a time dimension. Log or audit events for instance.

Why TimescaleDB ?

PostgreSQL 10 introduces a real partitioning scheme. Before that, users had to …

ANSIBLE_STDOUT_CALLBACK=debug

Fri 11 November 2016 By James Pic • Tags: ansible linux

Un nouveau plugin de type stdout callback fait son apparition dans la release d'Ansible 2.2. Ce plugin est fortement inspiré du snippet qui a tourné quelques années dans la communauté Ansible sous le nom de human_log.py. On peut l'activer avec ANSIBLE_STDOUT_CALLBACK=debug et en appelant ansible-playbook avec -v …

LXC démystifié

Wed 05 October 2016 By James Pic • Tags: linux

Cet article décortique l'installation de LXC. Le but est de pouvoir créer des conteneurs qui ont accès à internet, et qu'on peut résoudre en nom-de-conteneur.lxc. Je vais faire de mon mieux pour avoir une approche aussi instructive que ludique, un petit peu dans le style du site du zéro …

The Engineering blog