Our news aggregator consumes thousands of news stories a day. To cluster similar news stories at scale, we use data mining algorithms such as Locality-sensitive Hashing (LSH) with MinHash. This technique dramatically improves the scalability of the bag-of-words model.