collaborative intelligence and the pidgeonhole problem
Sun Dec 21, 2008
O'Reilly are hosting a Collaborative Intelligence foo camp this coming
weekend. There is some discussion bubbling out already on crowdvine at a group that has been created for this
One of the participants, Greg Linden, says the following:
"Hi, Cass. Absolutely, this is called the pigeonhole problem and can be an
issue if personalization is done poorly.
Done right, personalization enhances discover by helping you find things you
could not easily have found on your own. Discovery in vast quantities of
data is what personalization is designed to do. The key is to make sure the
personalization reaches beyond the obvious and into the surprising. If you
do that, personalization reveals the full breadth of the data and enhances
"Hi, Cass. Sure, I think there are things that can be done. Most of this
depends on the technique used for personalization. For example, if you
implement recommendations by showing people items that match on keywords or
from the same fine-grained subject category, you tend to get nearly
identical items and little diversity. If you use user behavior, such as what
people tend to buy together, you can often get more interesting patterns
out, especially if you tune the system to try to reach further afield (at
the risk of more spurious recommendations).
It's a great question and one that troubles all of us when thinking about
personalization. We are filtering, but trying to do it in a way that focuses
attention on interesting things, not that limits what people see.
I look forward to talking with you more about this at the conference!"
It brings to mind the idea that there exists a landscape in the search
space, and what one wants to do is avoid local minima, but explore enough of
the space to find related results that are not directly in
the neighborhood of the item of concern. This means there must be a
correspondence to the temperature of the system. Getting out of local minima
can be achieved through an annealing process.
Balancing relevance and serendipity through some
constrained minimization process, like the one set out in the information
bottleneck. How much better performance will you get out of these approaches
over simple algorithms and what will be the relative computational cost?
Does anyone know of any literature about this?