Feature Hashing, or the “hashing trick”

Feature hashing, or the “hashing trick,” is a clever method of dimensionality reduction that uses some of the important aspects of a good hash function to do some otherwise heavy lifting in NLP. This is a good blog post with the fundamentals of how and why the hashing trick works when working with a large, sparse set of vectors:

Hashing Language

Feature hashing is an elegant solution to the otherwise hairy problem of fighting the curse of dimensionality. It turned out to be extremely useful for a project I’m currently working on for a course at Columbia: Computational Models of Social Meaning.

Scikit-Learn has an implementation of the hashing trick if you’d like to read more about it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s