I know you're all still working on HW1, but HW 2 is now available on the website. We've covered everything you need to do it at this point, so feel free to get started if you're looking for things to do! It's due a week from Thursday.

As always, ask your questions as comments to this post... Good luck!

## Monday, January 10, 2011

For the plotting part of the crawler, how do we construct a CCDF given the set of data that we have? As in how do we turn our data points into a CCDF?

ReplyDeleteCCDF is a plot of f_c(x) = P(X > x). You should choose an appropriate number of x values and plot the P(X > x) to reflect the general shape of the plot. P(X > x) is equal to the number of urls with a number of links greater than x divided by the total number of urls.

ReplyDeleteCan we use a CDF for the Histogram plot? I'm using Mathematica to generate the histogram plots and they only have options for "CDF or "Probability" (fraction of values lying in each bin).

ReplyDeletehttp://reference.wolfram.com/mathematica/ref/Histogram.html

@Giordon You must plot the ccdf instead of the cdf. This is because it's the only way you'll be able to comment on the heavy/light tail of your data set. As we saw in class, the histogram is a flawed tool. If you can get the cdf, the ccdf is simply 1-cdf, so it shouldn't present much added difficulty.

ReplyDeleteI just want to remind everyone that the TAs are having office hours tonight (Tuesday) from 7-9, and I'll have office hours tomorrow (Wednesday) 7-9. You should really be getting in the habit of attending both days!

