Mapper Reducer
• 5 min read
Map Reduce
Problem faced by google:
Large Data like crawled pages over WWW. They need to do some analysis over this data. It’s really not possible to store all this data in one system and to analyse this data serially wil take a lots of time. So they created MapReduce Issue:
- parallelize Computation
- distribute the data
- handle failure cases
- load balancing
Programming Model
For example in large database we are trying to calculate number of occurrence of each words Map: receive a document
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
Reduce: will receive intermediate values
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Mapper produce (the,1) (map,1) (function,1) Reducer receive (“the”,{1,1,1,1,2,3,}) Reducer produce (“the”,9)
map (k1,v1) → list(k2,v2)
reduce (k2,list(v2)) → list(v2)
Implementation
- We Split the data into M split. Input split is processed in parallel by different machine.
- We split intermediate value into R split using partitioning function specified by user. ex (hash(value)%R). Steps
- The MapReduce library in the user program first splits the input files into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines.
- One of the copies of the program is special – the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.
- A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory
- Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.
- When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of themap workers. When a reduce worker has read all intermediate data, it sorts it by the intermediate keysso that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used
- The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition.
- When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.
for each completed map task,the master stores the locations and sizes of the R intermediate file regions produced by the map task. Updates to this location and size information are received as map tasks are completed. The information is pushed incrementally to workers that have in-progress reduce tasks.
Fault Tolerance
Master pings every machine time to time if they failed to response in time then master make the task assigned to that system as failure and reschedule that task to some other worker. Completed map task is reschedule on failure cuz intermediate data is stored in local disk but completed reduce task in not reschedule on failure cuz output is stored in global disk.
It is easy to make the master write periodic checkpoints of the master data structures described above. If the master task dies, a new copy can be started from the last checkpointed state. However, given that there is only a single master, its failure is unlikely; therefore our current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire. Reduce produce 1 output file Mapper produce R output file When a map task completes, the worker sends a message to the master and includes the names of the R temporary files in the message. When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file. If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file.
We split mapper phase into M split and reduce phase in R split ideally (M&R ) should be much larger than number of worker.
- Having each worker perform many different tasks improves dynamic load balancing
- speeds up recovery when a worker fails we tend to choose M so that each individual task is roughly 16 MB to 64 MB of input data (so that the locality optimization described above is most effective)
One of the common causes that lengthens the total time taken for a MapReduce operation is a “straggler”: a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. When MapReduce operation is close to completion master assign in progress task to backup machines.he task is marked as completed whenever either the primary or the backup execution completes.