Feature hashing, or the “hashing trick,” is a clever method of dimensionality reduction that uses some of the important aspects of a good hash function to do some otherwise heavy lifting in NLP. This is a good blog post with the fundamentals of how and why the hashing trick works when working with a large, sparse set of vectors:
Feature hashing is an elegant solution to the otherwise hairy problem of fighting the curse of dimensionality. It turned out to be extremely useful for a project I’m currently working on for a course at Columbia: Computational Models of Social Meaning.
Scikit-Learn has an implementation of the hashing trick if you’d like to read more about it.