Latest posts

A few caveats with online inference

Introduction

Online inference is a technique used to deploy machine learning models in production. However, there are several considerations to keep in mind when deploying models, especially when using FastAPI/gunicorn along with libraries such as NumPy and PyTorch. This article highlights a few of these caveats.

Model Loading

When …

mapPartitions vs mapInPandas

Prior to spark 3.0+, to optimize for performance and utilize vectorized operations, you'd generally have to repartition the dataset and invoke mapPartitions.

This had the major drawback of performance impact that was incurred from repartitioning (caused by shuffle) the DataFrame.

With spark 3.0+, if your underlying function is …

Multiple Condition Queues For Better Concurrency

I had been revisiting concurrent libraries that I had worked upon earlier and just wanted to highlight the importance of using separate wait sets and condition queues for your library implementations. The performance of these has been benchmarked using JMH.

Let me just list down the advantages of using separate …

Template matching with OpenCV

For structured (bounding box based) text extraction, it becomes imperative that the received image and target image are aligned properly and to scale. OpenCV is a great image processing library that has a ton of features.

To align source and template images, following steps are required.

  • First convert images to …

Android Implementing Google Sign In

As you all are aware that Google Plus is shutting down in March 2019 and so are all its services. I have had a legacy android app on play store that was using the GoogleApiClient for authentication with Google Plus services, alas, I had to upgrade the application to use …