Reactive Python - Cross-language RPC conversation
@xav-b|October 10, 2016 (8y ago)16 views
I had the opportunity to study the data science market lately and discovered some appealing products. We hunted a service that would let my team focus on actual data exploration and high business value models. You know, getting the DevOps stuff out of the way.
Most of the tools offered a way to code your own data pipeline and expose it behind an hosted REST API. I think it is a simple and reasonable approach:
- Plug-and-Play models that scale automatically lowers the risk of deploying cutting edge technologies
- You still have full control of the business/technical logic
- Most of developers are comfortable interacting with APIs
As desperate victims of the NIH syndrom (Not Invented Here) let's build such service for fun and profit !
The scene
Allow me to make up a realistic business case. We have of mobile application firing a storm of events on a dedicated API written in Node.js for performance, industry trending and asynchronous kind of motivations.
It forwards those events (like clicks, downloads, ...) to an online machine learning model to feed its comprehension of our users' behavior.
This setup brings near real-time representation of the world and aims for a dynamic and accurate prediction engine. Python is a reasonable choice given its battle-tested data science ecosystem and highly productive environment. The challenge yet is to link the API to the Python backend which needs to react to each single new event. In this article I will draft the first critical steps of a solution to this modern mission.
Side note : while we laid the foundation of a specific business case, I hope you will find interesting inspirations for more general reactive Python projects.
The stack
I will spare you the full research and jump to the interesting RPC (Remote Procedure Call) part. Microservices rise found in this protocole a reliable and elegant approach for inter-communication. Indeed we will see how a service can expose its methods to language-agnostic clients.
Google released a comprehensive framework using Protocol Buffers that certainely suits massive and demanding distributed infrastructures. I found, however, a simpler alternative in ZeroRPC while still production-tested (at Dotcloud in the old days before Docker). The project uses ZMQ to bridge services through RPC and support Node.js and Python.
Enough with tools of the trade, let's get started.
The processing layer
We are going to develop and expose a model describing the impact of different media ads on sales. To spice up the challenge, it will be continuously updated when new data is available. Dependencies are heavy but easy to install :
This part of the application is the server-side of ZeroRPC services. It serves methods to clients becoming able to call them.
Running python predict.py
will allow client to connect on 0.0.0.0:4222
and
call the predict()
class function. ZeroRPC takes care of serializing arguments
and return values, which is comfortable but forbid anything other than primitive
types (integers, strings, ... but no dataframes
for example).
Now that we have the communication skeleton, let's implement the continuous learning model. To keep things digest, we will stick to a linear regression to analyze sales data. We can answer a handful of business questions with this approach and it will bound lines of code. You will find more details in this playbook.
I highly recommend you to play with the dataset in an interpreter :
Run the script and jump to the next session to consume the API.
The reactive layer
In this section we will write a minimalist express API to bridge HTTP requests and our Prediction service. Again, a few commands should fetch everything the requirements. The code uses modern es6 syntax so you will need latest Node.js or a transpiler like Babel.
The strategy is to map each Python method to an HTTP endpoint, so we can fully access and exchange data witht the model. Since the code fits in a single file, I pasted the content below with (hopefully) descriptive comments, and no error handling (bad, don't reproduce at home).
Ready ?
Yeah, both services are.
Cross-language real-time predictions
Here we are to the actual reward. Assuming our API and engine backend are still running, let's play with the model from the command line.
First we simulate some new data hitting the API, like new sales statistics as captured by an application.
The response is the dataset summary after update. You can run this command
several times and see the Json
evolves to reflect new insights we sent in the
Python internal data representation.
Yet, we need to train it to take into account the data aggregated. Just visit
http://localhost:3000/v1/ia/train
from your favorite $BROWSER
and watch the
current OLS regression state. Here is mine :
And finally we can request a prediction from this very model.
The response tells us that an hypothetic value of 50
on TV ads will get us to
9.4
in sales.
Conclusion
Let's pat our back : we built a cross platform application exposing real-time model fitting and business prediction as a RESTFul service. Before funding a startup, notice we pass silent lot of serious things like API security, error handling, model building dark magic like feature engineering.
Nevertheless the business logic is encapsulated into a Python class that don't need to bother about data interfaces and external constraints, yet still being able to continuously adapt itself to the reality. Data engineers can improve the prediction process behind an API, saving the mobile team from deploying new versions to catch up.