A few caveats with online inference
Introduction
Online inference is a technique used to deploy machine learning models in production. However, there are several considerations to keep in mind when deploying models, especially when using FastAPI/gunicorn along with libraries such as NumPy and PyTorch. This article highlights a few of these caveats.
Model Loading
When …