Our goal was to build a web application to manage images and videos with custom tagging and the ability to query on the cloud with facial recognition
The architecture is similar to the process followed in this blog, the post below covers Initech Global’s approach
We were given a short time window to create the facial recognition application since our client identified an immediate need. Our requirements also involved building the application from the ground up. We concluded the best way to meet the timeline and provide the features requested would be a mixed architecture approach.
Why build new?
Our decision to build a new allowed us to meet the unique requirements provided for the project. Several image storage service providers can be found online (some even utilizing Lychee). None of these options fully encompassed the specific needs of our client outlined below:
- Custom tagging and searching
- Facial Recognition
- Expose APIs to serve pictures on internal or user facing applications
- Seamlessly load pictures from internal communication apps
- Video storage and facial recognition on videos
The simplest route was to provision S3 buckets for this task and three buckets were created:
- 2 buckets: User Facing – for approved images and un-approved images
- 1 bucket: Used to store cropped faces to sending for a facial match
This is one of our favorite services of the 160+ services AWS offers. Below we’ve highlighted a few API’s which were fundamental for our success.
- Collection: Created a collection per environment, this was used to collect the face vectors in a file – not the actual picture.
- Detect Faces: Gives us coordinates for every face in an image.
A Java-SpringBoot was coupled with a MySQL database as Backend utilizing Spring JPA for Dao. These tools helped create any CRUD rest services needed in a moments time. The rule of thumb was to never load a single image in memory of our back-end application, all images should be directly routed from the client to our image repository. If we chose to go entirely server-less, we would have opted DynamoDB but MySQL as a db is an easy choice, gives us ability to do aggregations, sorts and pull some stats with very less effort.
This is most important in every web applications. Even more, when your entire content is moved to cloud and accessed from web. I am not just talking about application security but how do we securely share content between browser and AWS without having our webserver in between causing bottleneck. This is how interaction happens
- Once user is successfully authenticated (against Okta in this case), we hit our backend and based on User’s Role and buckets to which user can access in read/update mode, we create a custom iam policy and generate a federation Token credentials with a pre set expiry time and share it with front end.
- On the client site, we refrain from storing it even in cookies or local storage, just use it from memory to interact with AWS resources.
Browser to S3 Interaction
- We have token with all the necessary access, all we now do is create presigned url for downloading and uploading images.
- Two different views for users to look at content, one for Administrators, which is directly calling S3 List Api to fetch image list to manage all images. Another for users, which is fetching approved image locations from our image indexes in mysql.
- Uploading images happens straight to S3 and calls our rest api for indexing and facial recognition, which I am going to explain bit detail.
- We decided to storage 2 variations of an image. One actual image and second Thumbnail of the image with same aspect ratio and fixed height of 200px. Both stored in same bucket with specific naming convention which can be easily differentiated.
- Thumbnails is what we pull when user is viewing the images on UI and pulling the actual image only in fullscreen view or when downloading.
- I wish we had stored a desktop size image as well, which can be used in slide show, because some of the high resolution images (> 15Mb) will take few seconds depending on bandwidth for downloading. Something I would recommend if you design anything like this.
Like I mentioned, images are not loaded in server which is setup as a pretty small machine(1 cpu/512Mb Memory). Two occasions where we will have to read the image in the backend, both are offloaded to Lambdas. We didn’t trigger these Lambdas from S3, but called Lambdas from our backend app asynchronously.
- One for creating the Thumbnail.
- Second for cropping the faces out and store in a face bucket used for adding to collection.
- When user upload to S3 is success, we call Rest Api to index it.
- This process takes care of creation of Thumbnails, detecting and capturing faces in to face bucket, running facial recognition against existing collection, tag the faces found, index everything in mysql, etc, etc.
- This sounds like a lot on server, but Lambdas doing their job along side, everything happens in fraction of second.
Here is simpler view of upload process:
- User can tag un-dected faces by clicking on face(shown on image above) and these tagged faces will then be indexed and created in Aws Collection and will be used for future uploads.
- Really have to appreciate Aws ‘Recognition’ service here, this is doing an incredible job in detecting even low resolution faces, with just 1 or 2 pictures matching from its collection. I don’t remember even one instance where it couldn’t deleted an already existing face.
This was an easy choice. we are huge fans of Angular, since its very first release back in 2011. I will mention couple of interesting things here
- First step on a successful login or page refresh, before any component or service instantiations, we needed to get aws federation credentials and initialize aws sdk. We had to add ‘APP_INITIALIZER’ to providers and load a factory returning a promise which fetches and initializes aws js sdk.
- One of the pages where we implemented an infinite scroll for Thumbview against S3 loads,we used ngx-virtual-scroller , we faced some challenges when Thumb sizes are different and we don’t know how many images to preload in a single page without doing some crazy math summing up thumb widths and window widths, and intelligently filling the page with right number of images to fit the screen. Literally bugged us for couple of days. We later went with an easy route of clipping the image if aspect doesn’t fit the desired width.(object-fit property).
- Back end is dockerized and deployed to AWS ECS
- Static Front end is served from S3 with Cloudfront
- Cloudformation to create and deploy entire infrastructure
- Gitlab Pipelines triggering build and deploy on commits
These are few things I could write here, sorry I couldn’t share any code snippets here. Did you feel anything that could have been done better? Did you build any such applications?