Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Python Data Analysis
  • Table Of Contents Toc
  • Feedback & Rating feedback
Python Data Analysis

Python Data Analysis

By : Ivan Idris
3.9 (16)
close
close
Python Data Analysis

Python Data Analysis

3.9 (16)
By: Ivan Idris

Overview of this book

This book is for programmers, scientists, and engineers who have knowledge of the Python language and know the basics of data science. It is for those who wish to learn different data analysis methods using Python and its libraries. This book contains all the basic ingredients you need to become an expert data analyst.
Table of Contents (22 chapters)
close
close
Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Key Concepts
Online Resources
Index

Filtering out stopwords, names, and numbers


It's a common requirement in text analysis to get rid of stopwords (common words with low information value). NLTK has a stopwords corpora for a number of languages. Load the English stopwords corpus and print some of the words:

sw = set(nltk.corpus.stopwords.words('english'))
print "Stop words", list(sw)[:7]

The following common words are printed:

Stop words ['all', 'just', 'being', 'over', 'both', 'through', 'yourselves']

Notice that all the words in this corpus are in lowercase.

NLTK also has a Gutenberg corpus. The Gutenberg project is a digital library of books mostly with expired copyright, which are available for free on the Internet (see http://www.gutenberg.org/).

Load the Gutenberg corpus and print some of its filenames:

gb = nltk.corpus.gutenberg
print "Gutenberg files", gb.fileids()[-5:]

Some of the titles printed may be familiar to you:

Gutenberg files ['milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare...

Limited Time Offer

$10p/m for 3 months

Get online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech and supported with AI assistants
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon

Create a Note

Modal Close icon
You need to login to use this feature.

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note