Sunday, January 30, 2011

Homework 4 Update

Some of you had doubts regarding Q2 in HW4 for the case when a webpage does not have outgoing links. Please note that the homework has been updated with a clarification. Download the new version from the course website.

Wednesday, January 26, 2011

Homework 3 clafication

Clarification regarding Q4: Q4 mentions "We add a new central node B to the network, and connect it with every other node on the ring...". Note that the "every other node" should just read as "every node" on the ring. It does NOT mean alternate nodes. So every node on the ring is linked to B independently with probability p.

If anyone has already done with the other meaning, you do not need to redo it. Sorry for the confusion!

Homework 4

Homework 4 was released today and is now available on the course web page.  So begins rankmaniac 2011...  I look forward to seeing all the crazy things you do.

(Remember not to get me in too much trouble though!)

I will be willing to post links to peoples pages from the course website if so desired.  There is a limit of 1 page per group that I will link to though.

Lecture 6

Over Monday and Wednesday this week we made our first venture into "exploiting" network structure instead of just trying to understand it.  Our example of a case where using network structure makes a huge difference is "search" -- the big idea of google back in 1998 was to use network structure to identify important pages, and feed this information into their search engine.  In particular, they defined a notion called "pagerank", which we explored over the two classes. Be sure to take a look at the original papers by Brin & Page on Google...it's quite interesting to read now in the context of what google has become.

Note -- understanding pagerank and the other aspects of search engines that we discuss will be very important for you on HW4, which will be posted after class today.

Thursday, January 20, 2011

Homework 3

As you know, HW3 has been handed out and is available on the website. Its due next Thursday. It seems that some of you are still unsure of some basic concepts in probability. Problem 4 of this homework heavily relies on your understanding of random variables and expectations. Please, feel absolutely free to clarify your understanding during office hours with the TAs. If you prefer, you may also contact me (Raga) personally to set up a time to meet and clarify your doubts.

As usual, ask your questions on the homework as comments to this post... Good luck!

Wednesday, January 19, 2011

Lecture 5

Today our focus was on explaining small world properties.  Remember that, in isolation, neither small diameter or high clustering coefficient is particularly "surprising" or difficult to explain.  It is only the combination that is "surprising"... 

But, we saw that one possible explanation for small world properties is a combination of local correlation with long-range random links.  However, this combination only explains the existence of short paths, it doesn't explain why we can find them so easily. 

To explain this second observation (and to me the more interesting observation) we saw that the long range links have to be proportional to the distance in a very precise manner.  So, it is in some sense much more rare that we can find short paths with myopic algorithms than it is for the short paths to exist in the first place.

Remember though that the models we have discussed for small world graphs DO NOT have heavy-tailed degree distributions!  So, we really haven't given a "natural" mechanism that explains all four of our "universal" properties, we've just given mechanisms for each individually.   Can you think of how to combine our heavy-tailed degree distribution ideas with the small world ideas?

Wednesday, January 12, 2011

Lecture 4

Today we talked a LOT about heavy-tails.

The details of what we covered could fill a whole probability course, so don't worry if you don't feel that you could prove all the results -- focus on understanding the intuition behind what I told you.  (And if you want to learn more, ask me and I'll point you in the right direction.)

In my mind the key things from today are
1) Know some examples of heavy-tailed distributions (Pareto, Weibull, and LogNormal)
2) Know some generic processes that create heavy-tails -- additive process (generalized CLT), extremal processes, and multiplicative processes.
3) Know how to apply these ideas to networks in order to get a heavy-tailed degree distribution.  (We introduced Preferential attachment today, and I'll do the proof of the degree distribution next time.)

Like I said in class, I could go on-and-on about heavy-tails...  So, I probably tried to pack too much into 1 lecture.  You'll likely need to go back and look at the notes (which will be up soon) in order to really process everything we talked about. 

Monday, January 10, 2011

Homework 2

I know you're all still working on HW1, but HW 2 is now available on the website.  We've covered everything you need to do it at this point, so feel free to get started if you're looking for things to do!  It's due a week from Thursday.

As always, ask your questions as comments to this post... Good luck!

Lecture 3

Today I showed a bunch of examples of other complex networks besides the web and illustrated that the properties we saw in the web graph aren't special to the web -- they happen in most of these other networks too.  So, they are in some sense "universal".

This motivates two natural directions:
1) Trying to understand what causes these properties to emerge
2) Figuring out how to exploit these properties.

We'll to 1 and then 2, spending about 3-4 lectures on each.

We started on 1 today by exploring Erdos-Renyi random graphs.  What we saw is that through a very simple random graph model we could obtain connectivity structure (a giant component and a small diameter) that very closely mimic what we saw in the web graph and the other networks.  So, in some sense, there is a very simple explanation of these properties -- they result from independent random connections.

But, the Erdos-Renyi graph does not provide an explanation for clustering or heavy-tailed degree distributions.  Understanding why these occur so often will be the focus of the next two lectures.

Thursday, January 6, 2011

Bonus Lecture: Probability Refresher

This is a reminder for the bonus lecture (a probability refresher) I will be giving tomorrow (Friday, Jan 7) at 1pm in Annenberg 213.

Here is the plan - we'll start by recalling the definition of a random variable and then review what discrete and continuous random variables are. Following that, we'll revisit the different types of distribution functions (probability mass function, probability density function, cumulative distribution function) and discuss some standard and well known discrete and continuous random variables (Bernoulli, binomial, geometric, Poisson, uniform, exponential, normal). Then we'll define expectation of a random variable and review some of the important properties of the expectation, with particular emphasis on linearity and its applications using indicator random variables (useful for HW1).

You don't have to attend the lecture if you are really comfortable with these concepts (and can comfortably solve Problem 2 of HW1). But if you've never really kept in touch with probability for a long time, then you'd probably benefit from the lecture.

It'll be a short lecture (about one hour). The slides are pretty verbose, and could serve as a reference later on. They will be posted on the course website after the lecture tomorrow.

Update: They are now posted on the course website.

Wednesday, January 5, 2011

Lecture 2

The notes and slides for lecture 2 are now up.

We didn't get quite a far as I'd hoped, but we'll finish up the slides next class by looking at the structure of lots of networks other than the web.

The key take-aways from today's lecture are the following 4 interesting properties:

1) Connectivity:  The web has one "giant" connected component and all other connected components are tiny.

2) Diameter: The distance between nodes is quite small (~6 avg. distance between noes in the SCC) and, for at least the beginning decade of the web, grow logarithmically with the size of the graph.

3) Degrees:  The degree distribution is not Normal... Instead it's heavy-tailed.  This means we should expect a few nodes with really "huge" degree and "lots" of nodes with small degree.

4) Clustering:  The web exhibits strong "homophilly", i.e., if A is connected to B and C it is significantly more likely than random for B to be connected to C.

We'll see next time that these four properties are "universal"...and then we'll try to explain why they are so common.

Don't forget to attend Raga's probability refresher on Friday @ 1pm in Annenberg 213!

Tuesday, January 4, 2011

No office hours this week

Since we don't have a HW set due until next week, and we haven't really gotten going yet, I guess we won't have office hours this week (Tue or Wed), so the first office hours will be mine on the evening of Tues 1/11 at 7-8pm in Annenberg's first floor conference room.   The TAs will then have their office hours on Wed 1/12 7-9pm.

As I said in class, I really hope that most of you will come to the office hours, even if you don't have questions, and work on the HW there.  Then, when you have questions you can ask them of me or the TAs (or your classmates). 

Monday, January 3, 2011

Homework 1

Homework 1 was passed out in class today, and is available on the course website.

Note that there is a typo in the due date on the ones handed out in class -- it is due next Thursday, Jan 13 at 1:30pm.  Be sure to start early.

Also, if you're nervous about whether you know enough probability, try out the first two problems and attend Raga's Probability refresher on Friday if you have any trouble.

If you have any questions about the homework (or want to point out typos), write a comments on this post and  the TAs/prof will respond ASAP.

Lecture 1

Glad to see so many faces at the first lecture -- I hope I didn't scare too many of you away...

If you still have any questions about what the course is about or about the procedural details of the course, please post them as comments here.  Also, I really recommend reading through the handouts that I gave out in class.  For those of you who didn't get a copy, there are links to them posted on the course website now.

Also, be sure to remember that Raga will be giving a probability refresher on Friday at 1-2pm in Annenberg 213. If you haven't had any probability since Math 2, I highly recommend attending.