Message No. 6
Conroling the number of ,mappers

In order to increase the number of mappers in your program (for assignment 2), in order to make the run of the Google-NGram dataset faster, you can decrease the maximal size of the split (by default 64M) by setting the mapreduce.input.fileinputformat.split.maxsize property in the configuration:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.set("mapreduce.input.fileinputformat.split.maxsize","1000000"); // 1000000 bytes
    Job job = new Job(conf, "assignment 2");
published on 15/06/2015 12:01:25 by Meni Adler