Friday, March 2, 2012

Homework 7

The last HW is now posted.  At this point, I've lectured about everything you'll need for all the problems except for problem 4, which focuses on MapReduce.  We'll cover MapReduce in class on Monday...  Good luck!

10 comments:

  1. I have some questions for the MapReduce problems:

    1. To which degree should we take communication into account? Can we just assume shared memory, and send huge lists around as values?

    2. Related to 1: Is file reading possible to do in parallel, or must the main method do that?

    3. In Question (1): Do you want the MapReduce call to print? What do you expect returned?

    4. In Question (2): Can you provide an example of the input, and expected output? Do you want every instance of grade X in every class to be printed, or do you expect any collation? Should the class/file name be printed in any way?

    5. In Question (2): Should I understand the tip such as no sorting is necessary what so ever? I assume Map and Reduced get called in an arbitrary way, so we can't use the calling/yielding order in any way.

    ReplyDelete
  2. 1. Arbitrarily long lists of values or lists of key-value pairs are fine. You shouldn't need more complicated data structures for these problems.

    2. Pass in to each parallel function call either a filename or the file contents (e.g. a list of the lines contained in the file). This circumvents the need for reading a file in parallel. The scenario would be "assume N files where N is very large".

    3. The MapReduce call should return a list that is the merged result of the key-value pairs returned by all the reduce() calls. Do the processing of this list (and printing) in the main() method.

    4. (Hopefully this helps answer 5)
    Sample input
    [file1]: 3 52 51 42 93 ...
    [file2]: 20 0 98 12 29 ...
    ...
    [fileN]: ...

    Sample output
    ... 98 ... 93 ... 52 ... 51 ... 42 ... 29 ... 20 ... 12 ... 3 ... 0 ...

    ReplyDelete
    Replies
    1. I'm confused about question 5 from above as well.

      Delete
    2. Yup I also do not understand how to sort via MapReduce ...

      Delete
    3. Hint: In Part 2 of the MapReduce problem, the scores take only integer values between 0-100.

      Delete
  3. 4. Additional clarification:
    Sample input
    [file1]: 3 52 51 42 93 ...
    [file2]: 20 0 93 12 29 ...
    ...
    [fileN]: ...

    Sample output
    ... 93 93 ... 52 ... 51 ... 42 ... 29 ... 20 ... 12 ... 3 ... 0 ...

    ReplyDelete
  4. Does Reduce always have to output a (key, value) pair? or can it just be a list? or a single value?

    ReplyDelete
    Replies
    1. Yes, though effectively you can also return just a list or a single value: either the key or the value may be a dummy return result (e.g. NULL).

      Delete
  5. In the "grip the grep" problem: if the same sentence occurs multiple times in the book do we only print it once?

    ReplyDelete
    Replies
    1. Sorry this wasn't fully specified in the question! You can do either one, but for simplicity it's fine if you print it just once.

      Delete