However, what is truly mind-boggling is not just the effectiveness of the protest, but the sheer reach of the protest and the heavy tails of the web. 2.4 million tweets were sent in protest in just 16 hours, 4.5 million people signed a Google petition against SOPA and PIPA, 14 million people contacted lawmakers protesting the bill, and a staggering 160 million people, more than half the population of the United States, saw the blackout on Wikipedia with 8 million of them contacting their representatives4 5.
Tuesday, January 31, 2012
The Staggering Numbers of the SOPA and PIPA Blackouts
However, what is truly mind-boggling is not just the effectiveness of the protest, but the sheer reach of the protest and the heavy tails of the web. 2.4 million tweets were sent in protest in just 16 hours, 4.5 million people signed a Google petition against SOPA and PIPA, 14 million people contacted lawmakers protesting the bill, and a staggering 160 million people, more than half the population of the United States, saw the blackout on Wikipedia with 8 million of them contacting their representatives4 5.
Universality and Critical Phenomena
As a physics major senior, I wanted to introduce the physicists' approach to explain the related notions called the universality and critical phenomena. Those topics are being studied in the condensed matter theory. From a condensed matter physics perspective, many interesting features of complex system arise from the largeness of the system. (Remember? We were always taking the limit where n goes to infinity.) One important point of the study (and problems found in web network) is that we are dealing with systems of particles(nodes, pages, etc) that are too large to have all detailed information about individual interactions(links). So, we physicists begin with simple well-known rules, called Hamiltonian, governing the relationships(links) among a few particles(nodes) and then scale it up to a system with a large number of particles. While scaling the system, physicists assume that some of key features of a large system will not be affected by further scaling up to an even larger system. For example, a hand-sized magnet and Earth are both already large enough from the perspective of atoms. Physicists extract equations from this "scaling invariance" and explain interesting features of the systems. This technique is called Renormalization Group. (Note that in a technical sense, physicists do the inverse of my description: start with a large system, and then scale it down to a smaller one with the same properties.)
This seemingly abstract idea turns out to be very powerful in explaining many of cool things in physical phenomena such as superconductors or phase transitions, and those concepts appear in nature repeatedly.
Some of them are well described in the article linked at the end of this post in easy terms. Here, I address one intuitive example that is the most related to the graph theory we have learned last week.
Consider a large number of small magnets. A pair of magnets are either correlated or not with some probability, p, which depends only on the temperature of the system. Here, "correlated" means that the two magnets are oriented in the same direction, and "not correlated" means their orientations are random. Suppose that theoretical analysis of a simple system containing only two magnets tells us that the probability is decreasing function in temperature. We can consider each magnet as a node in a graph and correlations as edges with probability p. At very large temperature, p goes to zero. According to the theory of random graph, we know we end up with all isolated nodes. This result implies that all magnets are randomly oriented, consequently producing net zero magnetic field. As the temperature goes down, the system reaches a critical point where the graph begins to form many clusters, which means small groups of magnets begin to align parallel each other. Finally, below a certain temperature T_c, where the probability p goes above log(n) /n, we see one giant connected graph: all magnets are oriented in the same direction. These "phases transitions" are exactly what we find from the process of cooling down a ferromagnetic material: no net magnetization, fragmented local magnetization, and finally the formation of a big magnet.
The insight we can learn from this example and many more in the reference source is that some of the universal features of large networks are originated not only from their particular graph structures but also from the largeness of the systems.
Reference Source:
http://www.math.ubc.ca/~slade/pcm0007_final.pdf
Monday, January 30, 2012
Twitter’s Effects on Sports Journalism
Lecture 7
Today we made our first venture into "exploiting" network structure instead of just trying to understand it. Our first example of a case where using network structure makes a huge difference is "search" -- the big idea of google back in 1998 was to use network structure to identify important pages, and feed this information into their search engine. In particular, they defined a notion called "pagerank", which we explored over the two classes. Be sure to take a look at the original papers by Brin & Page on Google...it's quite interesting to read now in the context of what google has become.
Note -- understanding pagerank and the other aspects of search engines that we discuss will be very important for you in "Rankmaniac Reloaded" on HW 4!
Office hours
Homework 4
I will be willing to post links to peoples pages from the course website if so desired. There is a limit of 1 page per group that I will link to though. Post the link as a comment here if you'd like me to add it to the main page.
Good luck!
Facebook, the World's Largest Photo Library
Entering the Cloud
Human Computation - ReCAPTCHA and Duolingo
Sunday, January 29, 2012
The Social Network's Big Impact on Political Changes
Saturday, January 28, 2012
Twitter’s New Policy and Internet Censorship
The Effect of Network Effects
The Internet, News, and Policy
Friday, January 27, 2012
Google's Change of Privacy Policy and Terms of Service - the Aftermath
Soon - all used for ad targeting |
Right, should you leave them active. There's a way to opt out, but the shared criticism directed at Google is really the fact that the entire data collection process should be opt in, as argues the FTC while simultaneously considering charges against the Mountain View giant. It is quite inconvenient that the entire process is going to be automatically opt in and it does take quite a bit of extra effort to opt out.
Most users probably do not know or care enough about opting out, so Google will most likely just benefit by default by getting a larger data pool. And what could that be used for? Well, more data is never worse. One use comes to mind: with more data, anything from clicks through content read on Google Reader, through YouTube preferences can be use to determine appropriate suitable content for the user. Furthermore, it is definitely easier to identify similar users, and - therefore - appropriately improve the clustering algorithms in order to improve ad content. And, as some studies show, better clustering is highly effective, with possibly sixfold increases in click-through ratios for ads utilizing clustering in their guesses.
Hey, that article inspired me. |
Whether it's just a side effect of Google's active effort to make our life easier (and their continued effort to convince us that they're doing good) or an actual evil scheme to get more data is up to debate (questioning of the company's principles is particularly fashionable nowadays). But didn't you know this already? The (not only) search giant really has known everything about you all along. Whether you are a Caltech undergrad, or a 24-year old woman who likes wombats - they know it.
http://www.google.com/policies/
http://www.washingtonpost.com/business/economy/google-privacy-policy-who-will-be-affected-and-how-you-can-choose-what-information-gets-shared/2012/01/26/gIQA69fNVQ_story.html
http://arstechnica.com/gadgets/news/2012/01/pascals-wager-googles-new-privacy-policy-could-anger-ftc.ars
http://www2009.eprints.org/27/1/p261.pdf
http://www.cnn.com/2012/01/27/tech/web/google-privacy-clarified/index.html
http://www.google.com/about/corporate/company/tenthings.html
Google+: Providing Companies With a New Way to Leverage the Social Web Graph
What Does the Internet Look Like?
With that in mind, I set out to find some of the images that people have generated of the Internet and its structure.
One of the most impressive projects that I discovered was the Opte Project. While it is a few years old, the project (that graphically represents the internet by mapping relations between class C networks instead of every single IP address) has produced some stunning images.
BitTorrent and Death by Heavy Tails
Thursday, January 26, 2012
World IPv6 Launch
YouTube's Small World
Although we maybe used to regarding social networks to services such as Facebook, the video sharing site YouTube also has a social network in its videos. Using the videos as the vertices and the related videos links as the directed edges, a group of researchers [1] constructed a network graph for each data set from their crawler and made some interesting findings. They were able to find the small world phenomenon in the YouTube social network.
The small world phenomenon is characterized by graphs with both high clustering and small diameter. Instead of using the average diameter or the 90th percentile, the researchers used the diameter of the largest strongly connected component of each graph and found that it was about 8. This means that instead of the six degrees of separation, there is about eight degrees from one video to another. Given that the clustering coefficient of random graphs is about 0, the clustering coefficient of the YouTube graph measured to be about 0.29 is considerably high. Because of the large clustering coefficient and the small diameter, this gives you a sense of YouTube's small world.
It will be interesting to see how these numbers change in time, especially as the number of vertices in the graph grows. How many hours of video content have been added to YouTube since you started reading?
[1] Xu Cheng; Dale, C.; Jiangchuan Liu; , "Statistics and Social Network of YouTube Videos," Quality of Service, 2008. IWQoS 2008. 16th International Workshop on , vol., no., pp.229-238, 2-4 June 2008
doi: 10.1109/IWQOS.2008.32
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4539688&isnumber=4539656
Lecture 6
We saw that one possible explanation for small world properties is a combination of local correlation with long-range random links. However, this combination only explains the existence of short paths, it doesn't explain why we can find them so easily.
To explain this second observation (and to me the more interesting observation) we saw that the long range links have to be proportional to the distance in a very precise manner. So, it is in some sense much more rare that we can find short paths with myopic algorithms than it is for the short paths to exist in the first place.
Remember though that the models we have discussed for small world graphs DO NOT have heavy-tailed degree distributions! So, we really haven't given a "natural" mechanism that explains all four of our "universal" properties, we've just given mechanisms for each individually. Can you think of how to combine our heavy-tailed degree distribution ideas with the small world ideas?
Wednesday, January 25, 2012
Anti-Counterfeiting Trade Agreement on the rise
The Power of the Internet
Never has the Internet technology industry flexed their muscles so strongly as they did last Thursday, January 19, 2012. That day, with the website blackouts against SOPA and PIPA, they demonstrated the sheer power of the Internet’s graph to affect people’s actions and decisions. In one day, 8 million people looked up who their congressman or woman was using Wikipedia’s tool (and presumably wrote to them as well). Google convinced 4.5 million people to write to their congressman or woman. Involvement on such scale would have been unheard and was impossible only ten years ago.
I think these successes are indicative of how the web graph is structured. Rather than being totally decentralized, I feel that the web graph is very dense around a few pages, like Facebook, Twitter, Wikipedia and Google. Many sites have links into these websites and they receive a massive amount of traffic. This is why the movement was so successful – the “hubs” of the graph were the ones leading the charge.
It’s quite funny to watch how the entertainment industry has reacted to these developments. These companies are staffed by people who don’t really understand the power of the Internet and social media. For example, they have launched “a television advertising campaign supporting the anti-piracy plans” (BBC). This is almost a laughable feeble move. There is no way they are going to be able to disseminate their message at the same rate or on the same scale as Reddit, Wikipedia or Facebook through a TV advertisement.
All this shows that knowing how to exploit the social and internet graph is of enormous benefit because such a graph has enormous power. It also demonstrates that the companies that control this graph have enormous power as well, perhaps more than entertainment companies that have been around for decades. The world is changing
Sources:
http://mashable.com/2012/01/18/zuckerberg-sopa-is-poorly-thought-out-law/
http://www.bbc.co.uk/news/technology-16628143
Monday, January 23, 2012
The friendship paradox: why a web crawl gives a biased sample, and "your friends have more friends than you do"
Why your friends have more friends than you do, 1991).
This result is superficially paradoxical, but it always holds given the (trivial) condition that not all nodes have the same degree. Intuitively, high-degree nodes have more friends, and each of those friends has 1 link to the high-degree node. Low-degree nodes have few links pointing to them. High-degree nodes are being counted more times, so any given link is more likely to point to a high-degree node. The proof is straightforward; Wikipedia gives a proof.
This idea is fairly robust to other types of graphs. The friendship paradox holds in graphs of sexual relations, even though these are approximately bipartite graphs rather than mostly random. Similarly, in directed graphs, nodes link to nodes with higher in-degree on average, and by symmetry, nodes are linked to by nodes with higher out-degree.
Suppose we try to sample part of the web graph by breadth-first search. Obviously considering the size of the network, we will terminate the crawl early, at some depth. Assuming we’ve only crawled a small part of the network, some hand-waving proves that we will have sampled a disproportionate number of pages with high in-degree. Hence the sample obtained from the crawl is biased w.r.t. in-degree distribution, and presumably other graph statistics. (This still holds if we disregard pages with (in-)degree 0.)
The friendship paradox suggests a cheap way to allocate vaccinations without having to map out the entire social graph. We want to distribute vaccines to the most socially active people, to limit disease transmission on as many links as possible. But mapping is expensive. Instead, pick some number of people uniformly at random from the entire population. Then randomly move a friend of each a few times. The resulting set of people likely have high degree in the graph, by the friendship paradox.
SHA-256 token: 54641f52bbfd085c229515c5c4c2e82d01756820dc3da125a375cc476ac7677b
Lecture 5
The details of what we covered could fill a whole probability course, so don't worry if you don't feel that you could prove all the results -- focus on understanding the intuition behind what I told you. (And if you want to learn more, ask me and I'll point you in the right direction.)
In my mind the key things to take away about heavy-tails are the following:
1) Know some examples of heavy-tailed distributions (Pareto, Weibull, and LogNormal)
2) Know some generic processes that create heavy-tails -- additive process (generalized CLT), extremal processes, and multiplicative processes.
3) Know how to apply these ideas to networks in order to get a heavy-tailed degree distribution. (And be sure to look at the details of the proof for Preferential attachment, since that'll be useful on the HW.)
Like I said in class, I could go on-and-on about heavy-tails... So, I probably tried to pack too much into one lecture. You'll likely need to go back and look at the notes in order to really process everything we talked about.
Sunday, January 22, 2012
The Building Blocks of Economic Complexity in a Global Product Space
"The New, New Economy"
With the number of smartphone users growing in the US and across the globe, companies who reach out through smartphone apps now have a new way to access to a large network of consumers. No longer do businesses need to appeal to the comfort of doing business in without leaving home, but the convenience of doing business anywhere and anytime with their smartphones.
With the exception of gaming apps, a business cannot be run with only an app at its front. The internet economy will not die with the emergence of the smartphone economy, however, companies that wish to survive this transition need to adjust to the new platform. For example, the internet changed the banking industry by offering customers access to their accounts and information securely online to make transfers, pay bills, and even deposit checks from their personal customers. Now, customers look for banks that offer these services from any time and any place via mobile devices.
Finally, just as was necessary with the arrival of the internet, survival in the “new, new economy” will depend of the virality of the business. Basically, survival depends on how fast you can get to the top and stay there. Consider the well-known story of the movie rental service Blockbuster and how it fell to its end because its failure to transition its business to the new terrain of the internet. A business that does not want this fate in the transition to the "new, new economy" will take heed and adjust quickly.
http://www.forbes.com/sites/toddhixon/2012/01/19/going-viral-on-the-mobile-web/
Talk on "Startups" Monday at 4pm
Title: Skills You Should Have at a Startup
Speaker: Anthony Chong (Caltech Alum)
Date: Monday, January 23, 2012
Time: 4:00p.m.
Place: 213 Annenberg
Abstract: After leaving Caltech, I moved in with a friend from high school as the first employee at Adaptly (a startup in NYC). Rather than going for a safer future in industry or grad school, I figured a few years in a startup would provide a faster learning environment and better vehicle for doing social good. In the first year we went from three of us working out of the living room of our apartment to 30+ people working out of our offices in New York and sales office in London. I plan to address three topics:
1. What is it like to work at a startup?
2. What are useful tools Caltech grads should know/get familiar with to contribute quickly to the software development life cycle?
3. How do you get the most out of professional development at a startup?
Friday, January 20, 2012
Video Games and Game Theory
Can a greater knowledge of game theory provide a significant advantage in video games? One genre of video games that stands out as having a possibility of higher success with this knowledge is MMORPG’s, or Massively Multiplayer Online Role-Playing Games. The general idea behind an MMORPG is you, the player, take on a new identity in a fantasy world online. In this game players live the life of their new persona and interact with other players through this identity. The big question is whether a knowledge of game theory will affect decisions made in this virtual world?
One aspect that could be affected by knowledge of game theory is cooperation between players in small groups. In many MMORPG’s there are missions or quests which may sometimes have the option to complete them either alone or with a group. Considering that players act as they would in real life and are “intelligent rational decision-makers,”[1] if a mission can be completed alone with a high chance of success then they will choose not to cooperate with others. However, if the probability of success for working alone is low then that player will choose to work with others. When a group of players decide to work together in order to increase the success of all members, this is an example of a Non-zero sum game. If all the players can all succeed or fail together it’s called a non-zero-sum. While all players now all have a higher chance of success, the possibility of failure is still present. Many players may use this aspect of game theory without realizing this terminology. The difference between someone who knows what a non-zero sum is and someone who does not is a player with this knowledge can use it to create good odds of success but realizes that if they have too many players in the group then, although success is much higher, the payoff will be lower because it needs to be split between the group.
[1] Roger B. Myerson (1991). Game Theory: Analysis of Conflict, Harvard University Press, p.1. Chapter-preview links, pp. vii-xi