Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Learning Data Mining with Python
  • Table Of Contents Toc
  • Feedback & Rating feedback
Learning Data Mining with Python

Learning Data Mining with Python

By : Robert Layton
3.7 (7)
close
close
Learning Data Mining with Python

Learning Data Mining with Python

3.7 (7)
By: Robert Layton

Overview of this book

If you are a programmer who wants to get started with data mining, then this book is for you.
Table of Contents (20 chapters)
close
close
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 2 – Classifying with scikit-learn Estimators


Scalability with the nearest neighbor

https://github.com/jnothman/scikit-learn/tree/pr2532

A naïve implementation of the nearest neighbor algorithm is quite slow—it checks all pairs of points to find those that are close together. Better implementations exist, with some implemented in scikit-learn. For instance, a kd-tree can be created that speeds up the algorithm (and this is already included in scikit-learn).

Another way to speed up this search is to use locality-sensitive hashing, Locality-Sensitive Hashing (LSH). This is a proposed improvement for scikit-learn, and hasn't made it into the package at the time of writing. The above link gives a development branch of scikit-learn that will allow you to test out LSH on a dataset. Read through the documentation attached to this branch for details on doing this.

To install it, clone the repository and follow the instructions to install the Bleeding Edge code available at: http://scikit-learn.org/stable/install.html.

Remember to use the above repository's code, rather than the official source. I recommend you install using virtualenv or a virtual machine, rather than installing it directly on your computer. A great guide to virtualenv can be found here: http://docs.python-guide.org/en/latest/dev/virtualenvs/.

More complex pipelines

http://scikit-learn.org/stable/modules/pipeline.html#featureunion-composite-feature-spaces

The Pipelines we have used in the book follow a single stream—the output of one step is the input of another step.

Pipelines follow the transformer and estimator interfaces as well—this allows us to embed Pipelines within Pipelines. This is a useful construct for very complex models, but becomes very powerful when combined with Feature Unions, as shown in the above link.

This allows us to extract multiple types of features at a time and then combine them to form a single dataset. For more details, see the example at: http://scikit-learn.org/stable/auto_examples/feature_stacker.html.

Comparing classifiers

There are lots of classifiers in scikit-learn that are ready to use. The one you choose for a particular task is going to be based on a variety of factors. You can compare the f1-score to see which method is better, and you can investigate the deviation of those scores to see if that result is statistically significant.

An important factor is that they are trained and tested on the same data—that is, the test set for one classifier is the test set for all classifiers. Our use of random states allows us to ensure this is the case—an important factor for replicating experiments.

Limited Time Offer

$10p/m for 3 months

Get online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech and supported with AI assistants
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon

Create a Note

Modal Close icon
You need to login to use this feature.

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note