Thursday, January 28, 2010

Lecture 7: Project ideas session

We had a very fun lecture from one of our TAs yesterday that hopefully primed you to think about some creative project proposals.  Virgil, the "internet man of mystery", walked through a bunch of ideas for projects that varied from mildy impolite to questionably legal.  If you weren't at the lecture, you should definitely check out his slides, which are now on the course web site.  And, if you want some other ideas check out his web page to see some of his past hacks... 

There are some more standard sources for project ideas listed on the course web site too.  And, if you're interested in doing a theory oriented project, check out some of the papers here, or talk to me or the TAs and we can point you to some interesting starting points.

Late policy

It seems like the term is getting busy for everyone, and there have been a number of requests for extensions on this homework.  So, here's the late policy we'll follow for HWs:

-10 pts if turned in by Friday at 6pm
-20 pts if turned in by Monday at 8am

You must turn in what you have by Monday to get credit unless there are special circumstances that you have talked to me about.

I hope this helps everyone!

Tuesday, January 26, 2010

Be sure to attend Wednesday's lecture!

You definitely don't want to miss Wednesday's lecture.  Virgil will be giving you tons of ideas for interesting projects from a wide range of web related areas...  We know that class will conflict with the career fair on Wed, but please try to go to the career fair early so that you can attend class at 1!

Monday, January 25, 2010

Lecture 6: Small worlds

Today in class we had our last lecture about network structure.  We talked about the combination of high clustering coefficient and small diameter -- so called "small world" graphs.  The main take-away is that small-world properties can result from a simple combination of highly correlated local structure combined with random independent long range connections.  The example we looked at is a lattice of local connections with added random long range connections. 

The second aspect of small world graphs that we focused on is that simple, myopic, distributed agents can actually find the short paths.  This turns out to be a much more delicate phenomenon to model.  But, we saw that it can be viewed as the result of adjusting the long range connections to be "distance-dependent".  However, this dependency on distance has to be very carefully chosen to ensure that myopic agents actually find the short paths.

That's it for network structure!  There's a lot more that we could have covered, but for next class we'll move on and discuss "search" ... which will end up making use of some features of network structure.

Monday office hours moved to Wednesday

Since we've moved the homework submission day to thursday, we'll move the Monday office hours with the TAs to Wednesday.

So from now on, we'll have TA office hours from 7-9 pm on Wednesdays. Adam will continue to hold his office hours 7-8 pm on Tuesdays.

Sunday, January 24, 2010

Project proposals

In case you haven't noticed, the project proposal assignment has now been posted to the course web page.  We'll talk more about this in class this week.   In particular, Wednesday's lecture is meant to help give you some ideas about possible projects. 

The proposals are due Feb 12, so there's plenty of time still, but it's definitely time to start thinking about them.  Please try to use office hours as a place to discuss your ideas with me and the TAs ... we can help guide you to resources and background, or give you some starting points if you're having trouble coming up with ideas.

Wednesday, January 20, 2010

HW 3 is out!

Grab it from the web page...and post your questions here.

Note:  You'll need to use Mathematica/Matlab/Maple for question 3, so if you're not comfortable with these, please be sure to start early and feel free to ask the TAs for help.  

Lecture 5: Heavy tails

Today we had what basically turned into a probability lecture about heavy-tailed distribution.  We spent the time talking about (i) some examples of heavy-tailed distributions and properties of them and (ii) what processes lead to heavy tails.  Then, we applied what we learned about where heavy-tails come from to the context of graphs to give a simple model (preferential attachment) that leads to heavy-tailed degree distributions.

During the second part, we talked about some very deep probability results (generalized central limit theorem and extreme value theory) where I gave only a peek of the results.  If you're interested in learning more about these, let me know and I'll point you to some references.

(The notes will go on the course webpage by the end of the day.)

Thursday, January 14, 2010

Lecture 4: A first attempt at modeling networks

We spent the lecture today finishing up our discussion of G(n,p) random graphs.  We saw that G(n,p) turns out to have many of the connectivity and diameter properties that we saw in the web graph and the other networks we've discussed in previous lectures.  Further, we saw that G(n,p) has beautiful "threshold" behavior for almost every property we care about.  Most of your next homework will involve looking more deeply into the properties of G(n,p) so, if you got lost during the lecture or missed it because you were working on the HW, be sure to look back at the notes and ask questions. (The lecture notes are not available on the course webpage.)

Also, I found out (to my surprise) that many of you have not used "linearity of expectation" much.  If this applies to you, please be sure that you follow how we used it in class and be sure to look at the HW1 solutions so that you can see how it was useful in problem 2.

Monday, January 11, 2010

HW 2 is out

Grab it from the web page if you didn't get a copy in class.

Post your questions as comments here...

Lecture 3: Beyond the web

Today we finished up our discussion of the web graph by looking at "clustering" and then we plowed through a discussion of how the 4 properties we saw in the web graph show up in a wide variety of other networks, and so are "universal" in some sense.

From here we're going to try to understand where these properties come from?  The idea is that since all these networks (which evolve or are built using very different mechanisms) have the same structural properties, there must be some simple and universal explanation as to where the properties come from.

To try to understand this, we started with the simplest possible idea of how a graph might form -- as a sequence of independent random connections being made.  This led us to the Erdos-Renyi random graph, which we started to discuss, and will continue to discuss during the next lecture. 

I've posted the notes for lecture 3 "Beyond the web" on the course website already, but I'll hold off on posting the notes for the Erdos-Renyi graph until Wednesday when we finish discussing them.

Another seminar this week

X-Seed, a company that funds seed projects, will be coming on Thursday Jan 14 @ 2:30 to give a talk on "commercializing ideas" ... evaluating whether an idea can be commercialized and the steps to do so.  The talk is in Ann 213 and is open to everyone.

Friday, January 8, 2010

Relevant upcoming seminars

As I mentioned in the first lecture, there are a lot of interesting topics related to crowd sourcing and content aggregation that rely on machine learning.  We won't go into the machine learning related aspects of these topics in this course, but Yasar's course (146ab) focuses a lot on these issues...and, there's a very interesting upcoming guest lecture on the topic that I'd highly recommend everyone attend!  It focuses on the "Netflix prize" and is given by a lead member of one of the winning teams.

----quoted from Yasar's email ------

Dr. Joseph Sill, a Caltech alumnus and a lead member of one of the two teams
that tied the top score in the Netflix competition, will give a two-part
seminar next week about the technical details and other insights into this
highly celebrated competition in machine learning.

The seminar will be classroom style with whiteboard level of detail, and
will be given as guest lectures in the Learning Systems course, but others
at Caltech are welcome.


When: Tuesday, January 12, and Thursday, January 14, 2:30-4:00 PM.
Where: 070 Moore

----quoted from Yasar's email ------ 
 

Wednesday, January 6, 2010

Lecture 2: The structure of the web

Well, I didn't quite finish what I had hoped to cover in lecture today, but we'll come back to the last section and look at "Clustering" at the start of the next lecture.  The homework does a little bit with clustering, but it's completely self-contained.  So, it'd be great if you finished that before the next lecture!  (Remember that HW2 will go out on Monday even though HW1 isn't due until Wednesday.)

The notes (including the part about clustering) are up now.  They are handwritten, so I hope they're legible.  Please let me know if you find any typos, and I'll correct them.

Tuesday, January 5, 2010

Lecture 1: What is this course about?

I was glad to see so many people at the lecture Monday, I hope to see most of you again Wednesday!

-- It seems like the room will be tight, but workable, so unless lot's more people show up Wed we'll stay put in 213 Ann

-- As per our discussion, my office hours will be 7-8pm on Tuesdays, but if there are lot's of people asking questions I'll probably be willing to stay past 8.

-- I forgot to ask if people want me to have office hours this week.  My guess is that there's not much of a point since we haven't started much yet.  But, if you'd like to come today, post a comment here so that I know there's interest and I'll stick around.  If there are no comments, then I'll start my office hours next week.

-- With regards to the size of the web, Virgil made a nice observation after class: there is a page for each natural number listing interesting properties of it (see wikipedia for example), which proves that the web is at least countably infinite. 

-- Don't forget to start the homework early!  The preferred way to ask questions is to comment on the blog post for HW1.  But, of course, if you have something that you don't want to post, please feel free to email the HW guru (for HW1 this is JK).

Update: The slides are now up

Sunday, January 3, 2010

Homework 1 is out!

Homework 1 has been posted on the course web site.

Please feel free to discuss the homework and ask us any queries in the comments!

Welcome to CS144!

And, welcome back to Caltech!

We will be using this blog to post course announcements and discuss the homeworks/lectures, so please check it regularly!