Over Monday and Wednesday this week we made our first venture into "exploiting" network structure instead of just trying to understand it. Our example of a case where using network structure makes a huge difference is "search" -- the big idea of google back in 1998 was to use network structure to identify important pages, and feed this information into their search engine. In particular, they defined a notion called "pagerank", which we explored over the two classes. Be sure to take a look at the original papers by Brin & Page on Google...it's quite interesting to read now in the context of what google has become.

Note -- understanding pagerank and the other aspects of search engines that we discuss will be very important for you on HW4, which will be posted after class today.

## Wednesday, January 26, 2011

Subscribe to:
Post Comments (Atom)

One point I didn't get or remember from the lecture is whether P should automatically include self-transitions. This is important for HW4 Q2.

ReplyDeleteIf a node has no in or out-going links, then the corresponding row-sum of P would be 0 instead of 1, in which case I'm not convinced that r(t) would always sum up to 1.

Note that adding weak edges won't necessarily rescue the case because they only contribute (1-\alpha) to the row sum.

Adding self-transitions, on the other hand, seem to do the job without affecting the interpretability of P too much. Should we assume so for the HW?

@Dave, Thanks for pointing this out. I meant to mention this in class & we should have mentioned this in the problem statement. In general, P need not have self loops (since we take care of all of that in the second term of G). But, we do require that the rows of P sum to one, and so if a node has no out edges, we put a 1 in the diagonal entry for that row. If you think of the iterative meaning of pagerank this makes sense. If a node has no out edges, it's rank stays with itself at the next iteration.

ReplyDelete