< Applesauce
Next >

[Comments] (3) : Netflix has a contest going to improve their recommendation engine's ranking by a certain amount. I've gotta say it's the best-specified contest I've ever seen. I'm leery of these contests because the rules usually have a "we own all your ideas even if you don't win" clause. This one just has a "we can use your idea even if you don't win" clause, which is less onerous. And the prize is a cool million, rather than the more traditional prize of a candy bar.

Since I have previously dabbled in recommendation engine design, some friends have asked me if I'm going to go for this prize. Well, I don't really need the money -- just now I got email telling me I'd won $2,500,000.00 -- but more importantly I don't think my Ultra Gleeper ideas are applicable to this domain. They mainly focus on improving the recommendations, whereas this contest is very tightly focused around predicting users' opinions given specific recommendations.

Not to say I don't have ideas. If I were going for the prize I would use IMDB and Amazon data for the movies to gather hypotheses about why people rate a movie high or low. I would then build profiles for the users, using the set of hypotheses as a basis. Then I would predict the users' future actions based on the profile. That sounds like handwaving but the basic idea (which others have seized on as well) is using objective data about the movies, instead of trying to figure out connections between them solely by how Netflix users rate them. And weighted vectors would probably be involved.

Of course, for all I know Netflix already uses external data to run their recommendation engine, so don't just run with this idea and expect to get anywhere.


Posted by Nathaniel at Wed Oct 04 2006 16:29

This is pretty much a standard data-mining competition:
just with a huge prize. (OTOH, looking at history, the market value of a really superior machine learning algorithm can be much higher than $1e6.)

There is a lot of principled math on how to do this kind of thing that can and will be thrown at the problem. Dumping IMDB/Amazon stuff in would be adding new features for your predictor to use (they don't even have to be objective, so long as they are not totally random), and "gathering hypotheses" is called in the literature "feature selection" (or "dimensionality reduction", very similar), in case anyone wants to read up on it... Machine learning is awesome stuff.

Posted by Daniel at Wed Oct 04 2006 21:36

Actually, it's more like incorporating prior knowledge (for example, as in Bayesian statistics) than feature selection or dimensionality reduction. In any case, this definitely is an important problem in machine learning.

Posted by Nathaniel at Thu Oct 05 2006 04:10

I was interpreting Leonard's mention of "hypotheses" as complex features like "has lots of actions", "campy", etc., extracting these from the raw data (I guess they'd probably have to be non-linear, hrm), and using them for the actual regression. Certainly not the only way to approach it. I'm not sure what you mean about priors. The obvious place I can see to use something like that is importing 3rd party movie reviews, and even they are probably better handled as features (it's entirely plausible my preferences are anti-correlated with some reviewer's). I'm probably just missing something, though.

I am amused to note that you're just across campus from me -- I'm in Cogsci :-).

[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.