Sungjoo Ha

Evergreen

StumbleUpon Evergreen Classification Challenge

While some pages we recommend, such as news articles or seasonal recipes, are only relevant for a short period of time, others maintain a timeless quality and can be recommended to users long after they are discovered. In other words, pages can either be classified as "ephemeral" or "evergreen". [...] Your mission is to build a classifier which will evaluate a large set of URLs and label them as either evergreen or ephemeral.

I placed 22nd out of 625 teams. The winner had AUC score of 0.88906 and my score was 0.88495.

Approach

Various text extraction heuristics such as article extraction using boilerpipe. TF/IDF vector representation was fed into stochastic gradient descent classifier.

I've written a short post-mortem of the competition in StumbleUpon Evergreen Classification Challenge (Korean).