Partially Attended

an irregularly updated blog by Ian Mulvany

collaborative intelligence and the pidgeonhole problem

Sun Dec 21, 2008

396 Words

O'Reilly are hosting a Collaborative Intelligence foo camp this coming

weekend. There is some discussion bubbling out already on crowdvine at a group that has been created for this

meeting.

One of the participants, Greg Linden, says the following:



"Hi, Cass. Absolutely, this is called the pigeonhole problem and can be an

issue if personalization is done poorly.



Done right, personalization enhances discover by helping you find things you

could not easily have found on your own. Discovery in vast quantities of

data is what personalization is designed to do. The key is to make sure the

personalization reaches beyond the obvious and into the surprising. If you

do that, personalization reveals the full breadth of the data and enhances

serendipity."





and then





"Hi, Cass. Sure, I think there are things that can be done. Most of this

depends on the technique used for personalization. For example, if you

implement recommendations by showing people items that match on keywords or

from the same fine-grained subject category, you tend to get nearly

identical items and little diversity. If you use user behavior, such as what

people tend to buy together, you can often get more interesting patterns

out, especially if you tune the system to try to reach further afield (at

the risk of more spurious recommendations).



It's a great question and one that troubles all of us when thinking about

personalization. We are filtering, but trying to do it in a way that focuses

attention on interesting things, not that limits what people see.



I look forward to talking with you more about this at the conference!"





It brings to mind the idea that there exists a landscape in the search

space, and what one wants to do is avoid local minima, but explore enough of

the space to find related results that are not directly in

the neighborhood of the item of concern. This means there must be a

correspondence to the temperature of the system. Getting out of local minima

can be achieved through an annealing process.

Balancing relevance and serendipity through some

constrained minimization process, like the one set out in the information

bottleneck. How much better performance will you get out of these approaches

over simple algorithms and what will be the relative computational cost?

Does anyone know of any literature about this?


Read and post comments |
Send to a friend



This work is licensed under a Creative Commons Attribution 4.0 International License