Web application to manage images and videos with custom tagging and searching on cloud with facial recognition
Our architecture is somewhat similar to what’s laid out in this blog. Below details covers how we, at Initech Global, did it.
We build entire application from scratch in very aggressive timeline. We didn’t go entire serverless, we went with a mixed architecture to achieve that.
Why build new?
There are many image storage service providers online, few of them open source like Lychee too, but none of them really suited the specific needs of our client. Here are few points
- Custom Tagging and Searching.
- Facial Recognition.
- Expose Apis to serve pictures on other internal or user facing applications, all from just from this one spot.
- Seamlessly load pictures from internal Communication apps.
- Video storage and Facial recognition on videos.
Simple and straightforward option is to go S3. We ended up with 3 buckets. two user facing, for approved images and un-approved images, one for storing cropped faces for sending to facial match.
One of my favorite services of 160 plus services AWS offers. Here are a couple of apis to note from what we used.
- Collection: Created a collection per environment. This basically stores all the face vectors in a picture not the actual picture.
- Detect Faces: Gives us coordinates for every face in an image.
we build a Java-SpringBoot with MySql database as Backend and Spring JPA for Dao, which helped creating any crud rest services we needed in just few minutes. We are completely dealing with images and videos and one thumb rule, never load a single image in memory of our backend application. Everything should be directly routed from client to image repository. Had we chosen to go entire server-less, we would have opted DynamoDb but MySql as a db is an easy choice, gives us ability to do aggregations, sorts and pull some stats with very less effort.
This is most important in every web applications. Even more, when your entire content is moved to cloud and accessed from web. I am not just talking about application security but how do we securely share content between browser and AWS without having our webserver in between causing bottleneck. This is how interaction happens
- Once user is successfully authenticated (against Okta in this case), we hit our backend and based on User’s Role and buckets to which user can access in read/update mode, we create a custom iam policy and generate a federation Token credentials with a pre set expiry time and share it with front end.
- On the client site, we refrain from storing it even in cookies or local storage, just use it from memory to interact with AWS resources.
Browser to S3 Interaction
- We have token with all the necessary access, all we now do is create presigned url for downloading and uploading images.
- Two different views for users to look at content, one for Administrators, which is directly calling S3 List Api to fetch image list to manage all images. Another for users, which is fetching approved image locations from our image indexes in mysql.
- Uploading images happens straight to S3 and calls our rest api for indexing and facial recognition, which I am going to explain bit detail.
- We decided to storage 2 variations of an image. One actual image and second Thumbnail of the image with same aspect ratio and fixed height of 200px. Both stored in same bucket with specific naming convention which can be easily differentiated.
- Thumbnails is what we pull when user is viewing the images on UI and pulling the actual image only in fullscreen view or when downloading.
- I wish we had stored a desktop size image as well, which can be used in slide show, because some of the high resolution images (> 15Mb) will take few seconds depending on bandwidth for downloading. Something I would recommend if you design anything like this.
Like I mentioned, images are not loaded in server which is setup as a pretty small machine(1 cpu/512Mb Memory). Two occasions where we will have to read the image in the backend, both are offloaded to Lambdas. We didn’t trigger these Lambdas from S3, but called Lambdas from our backend app asynchronously.
- One for creating the Thumbnail.
- Second for cropping the faces out and store in a face bucket used for adding to collection.
- When user upload to S3 is success, we call Rest Api to index it.
- This process takes care of creation of Thumbnails, detecting and capturing faces in to face bucket, running facial recognition against existing collection, tag the faces found, index everything in mysql, etc, etc.
- This sounds like a lot on server, but Lambdas doing their job along side, everything happens in fraction of second.
Here is simpler view of upload process:
- User can tag un-dected faces by clicking on face(shown on image above) and these tagged faces will then be indexed and created in Aws Collection and will be used for future uploads.
- Really have to appreciate Aws ‘Recognition’ service here, this is doing an incredible job in detecting even low resolution faces, with just 1 or 2 pictures matching from its collection. I don’t remember even one instance where it couldn’t deleted an already existing face.
This was an easy choice. we are huge fans of Angular, since its very first release back in 2011. I will mention couple of interesting things here
- First step on a successful login or page refresh, before any component or service instantiations, we needed to get aws federation credentials and initialize aws sdk. We had to add ‘APP_INITIALIZER’ to providers and load a factory returning a promise which fetches and initializes aws js sdk.
- One of the pages where we implemented an infinite scroll for Thumbview against S3 loads,we used ngx-virtual-scroller , we faced some challenges when Thumb sizes are different and we don’t know how many images to preload in a single page without doing some crazy math summing up thumb widths and window widths, and intelligently filling the page with right number of images to fit the screen. Literally bugged us for couple of days. We later went with an easy route of clipping the image if aspect doesn’t fit the desired width.(object-fit property).
- Back end is dockerized and deployed to AWS ECS
- Static Front end is served from S3 with Cloudfront
- Cloudformation to create and deploy entire infrastructure
- Gitlab Pipelines triggering build and deploy on commits
These are few things I could write here, sorry I couldn’t share any code snippets here. Did you feel anything that could have been done better? Did you build any such applications?