Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 783 Bytes

README.md

File metadata and controls

16 lines (9 loc) · 783 Bytes

Tweets-preprocessing

Preprocessing for tweets dataset using NLTK.

As we are all know we are in the era of data and most of this data are unstructured and based on article on mongodb :

From 80 to 90 percent of data generated and collected by organizations, is unstructured,, and its volumes are growing rapidly — many times faster than the rate of growth for structured databases.

So part of our work is to handle and clean this data so that it becomes useful and meaningful.

So here is my work as part of my assignment for natural language preprocessing.

I'm beginner so any improvements even a little ones will be appreciated.

Link of the dataset : https://www.kaggle.com/manchunhui/us-election-2020-tweets

Link of the article : https://www.mongodb.com/unstructured-data