How to put AI projects in production – that’s a topic that since a while occupies heads of many data scientists and plays a key role on many international conferences. Unfortunately there is no one clear answer to that, as the variety of topics that we nowadays name as AI is huge. This can be classic machine learning, language processing, fraud detection, recommendation systems, forecasting, speech recognition, image processing, IoT… Then also arises a problem with handling big data sets in almost no time and optimal stream processing. Sounds overwhelming when combined together. But the good news is that we can make it simpler by splitting the problem into smaller pieces to help us make a good decision for our business and considering the tools that we have.
What is important when thinking about productionizing AI?
(Based on several articles/books/videos that I looked into).
It is very tempting to have everything in just one place, one machine or system. It is definitely easier to control and maintain or apply security measures. On the other hand that increases the risk of failure, when our machine is down then everything is down. We may also need hell of a machine for all the necessary computations.
Recently trending concept is to separate data from processing. To store the data conveniently where it make sense and to perform computations on suitable platform. Last thing that we want is to copy data back and forth. Data can be stored in so called data fabric. It means that we do not need to copy all of it into one place, we can have it split on several machines, and even geographically. We need to allow different systems to have uninterrupted, easy and fast access to that. If we define a sharp line between applications and data, we allow multiple systems to make use of the same data infrastructure. Here’s where containerization comes in extremely handy. Our containers may read data from one source and store them on another, allowing different processes to use them further. We can perform modelling using most suitable tools for given problem, without caring where our data comes from. Of course it is good to consider mixing compatible systems. Take for example Hadoop – if you have your data stored in HDFS and processed by Pig, you will most likely need to copy them to some other filesystem or database in order to perform data modelling. Then after modelling is done, you may need to copy them back to HDFS, which creates a lot of overhead. It may be more beneficial to use Spark with ML (pyspark, SparkR) in such a case.
We should always be able to recognise differences between test and production environments. This not only considers software differences but also data related ones. We tend to, while doing AI, work with data samples and hence not consider changing nature of the data. Containers may help us with the first part and proper validation with the latter. We need to be able to maintain replicable conditions so that while moving to production we do not have that many software related surprises. It is also a good practice to test our models on production data before actually deploying the model.
Another thing to consider is the flexibility to adapt to changing conditions. Having more than one model definitely helps. Our best one may easily get outdated due to changing nature of data over time. It is good to be able to quickly and without downtimes replace such model with a new one. Again containers come in handy, as we may put each model into separate container and easily turn them off and on.
Having several models running in parallel we can decide on a logic that will, based on some conditions, choose results of particular winner. It basically allows us to be more confident that we will get some modelling results in any case. Additionally when one model breaks we still have some other ones running, so our system doesn’t stop. Of course we may start getting less accurate results after switch, and we need to be able to recognize that with proper model validation.
As AI processing can be time consuming we can also decide to provide results fast with less reliable model and then more accurate ones when results from better model come through.
This is all part of so called the rendezvous architecture, which seems to fit well into AI world.
In production models need constant validation, as the lifecycle of the system is usually long. Even slight change in the input data may significantly impact results of the model. It is a good practice to have some validation rules in place (i.e. thresholds which results should not pass). Always store metrics for models performance and accuracy. When something goes wrong we need to be able to identify if that was related to wrong new code or wrong new data. It helps to look at results distribution for detecting big changes in inputs. Consider having so called canary model – reasonable and very stable model running as baseline for benchmarking other models. It allows to asses performance of any new model.
For better results comparison you may look at their quantiles distributions instead of just pure results. It eliminates the need of calibrating the scores for such a comparison.
It also makes a lot of sense to preserve the raw data, instead of just preprocessed part used by particular models. Features that we do not use now may turn out to be extremely important in the future. And raw data gives the ability to retrace back some data related issues. You can also have a decoy model, which is not a model per se, but sits with other models and only records the data. It allows us to spot when something changed in the input unexpectedly. Especially when we have several inputs which interact together.