Image for post
Image for post

Today I want to share with you my experience on deploying Libraries on AWS Lambda. And the best practices I found.

This experience is shown with Python but can be replicated with other programming languages.

They are multiples ways to deploy packaged libraries on AWS Lambda. The easiest would be to zip your project and to upload it directly on Lambda if the package is smaller than 50 MB and using AWS S3 if the package is bigger than 50 MB. Note that the package has to be smaller than 250 MB once unzipped anyway.

The first problem of this solution is that if the package exceed 3 MB, which is always the case if it contains some Python libraries, you won’t be able to edit your fonctions through the lambda console editor. Consequently every time you want to change even a single character you will have to re-zip you project and re-upload it. It will be a huge waste of time as you will deploying your large binary with each code change..
The second problem is that most of the time you use the same libraries for your projects. It is redundant to re-package and re-upload each time the same binaries. …


In this story you will learn how to automatically process your data in a Machine Learning pipeline on AWS.

Nowadays, every data scientist should know how to integrate their models within a cloud platform so that they can enhance their work and become more valuable as a data scientist. Unfortunately integration concept is a bit hard when you are beginner but luckily this story is therefore for you if you want to build your first machine learning pipeline on the cloud and more precisely on Amazon Web Services (AWS).

Image for post
Image for post
Pipeline architecture

As you can see on the schema, the pipeline’s input is a S3 upload of some data and the pipeline’s output is the data preprocessed written on S3. …


A method to do feature engineering on your text data for binary classification

Image for post
Image for post
Image from https://unsplash.com/photos/0E_vhMVqL9g

Natural language processing is a recurrent topic about machine learning, and there are many ways to deal with it. In this topic I will focus on the discriminant power analysis, a very interesting data featuring method for binary classification.

This method consists in finding the most discriminant words between two classes of target. This morphological approach is interesting as, despite a low complexity, it gives good results.

For this article I will detail a full example from preprocessing to modelling and prediction with the spam data set available on kaggle. Note that in this article, although I will give you many clues and tips, I am not going to give you the full code especially for the Discriminant Power Analysis part as long as I consider that it is way more interesting for you to understand what you are doing. …

About

Léo Le Henaff

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store