Tweets-preprocessing

Preprocessing for tweets dataset using NLTK.

As we are all know we are in the era of data and most of this data are unstructured and based on article on mongodb :

From 80 to 90 percent of data generated and collected by organizations, is unstructured,, and its volumes are growing rapidly — many times faster than the rate of growth for structured databases.

So part of our work is to handle and clean this data so that it becomes useful and meaningful.

So here is my work as part of my assignment for natural language preprocessing.

I'm beginner so any improvements even a little ones will be appreciated.

Link of the dataset : https://www.kaggle.com/manchunhui/us-election-2020-tweets

Link of the article : https://www.mongodb.com/unstructured-data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tweets-preprocessing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tweets-preprocessing