Class Documentation

mockr.run_pandas_job(input_data, map_fn, reduce_fn, n_chunks=4)

run_pandas_job expects input to be a Pandas DataFrame.The rows of the data frame are equally divided into chunks and each chunk is sent to a separate map worker.

Notes:

  • you may wish to pre-shuffle your input_data
Parameters:
  • input_data (pandas.DataFrame) – pandas DataFrame to be processed.
  • map_fn – a function with signature (chunk) that maps input_data to one or more (key, value) tuples, which are emitted via yield. chunk will be a pandas.DataFrame that is a row-wise section of input_data. The yielded key and value can be of any type, they will be passed to reduce_fn. reduce_fn: a function with signature (key, value) that reduces one or more key, value pairs into a single (key, result) tuple which is emitted via yield.
  • n_chunks (int) – The number of chunks to divide the input_data into. input_data must be able to divide evenly into chunk size pieces.
mockr.run_sequence_job(input_data, map_fn, reduce_fn, n_chunks=None)
run_sequence_job expects input_data to be of type Collections.abc.Sequence e.g. Python List. Sequence Jobs provide two exection methods:
  • the sequence is divided into chunks and each chunk is sent to a separate map worker
  • each item in the list is individually sent to a dedicated map worker

Notes:

  • you may wish to pre-shuffle your input_data
Parameters:
  • input_data (Collections.abc.Sequence) – Sequence type holding data items e.g. Python list of str.
  • map_fn – a map function with signature (chunk) that yields one or more (key, value) tuple. When n_chunk = None then chunk will be a single item of input_data. When n_chunks = int then chunk will be a sub-sequence of input_data of length len(input_data)/n_chunks.
  • reduce_fn – a reduce function with signature (key, value) that yields a single (key, result) tuple.
  • n_chunks (int) –

    The number of chunks to divide the input_data into. input_data must be able to divide evenly into chunk size pieces. See map_fn for more details and

    defined behaviour.
mockr.run_stream_job(input_data, map_fn, reduce_fn)

run_stream_job expects the input to be a string. Newline (“n”) characters delimit “chunks” of data and each line/chunk is sent to a separate map worker.

Parameters:
  • input_data (str) – newline delimited string, each string is assigned to a map worker
  • map_fn – a function with signature (chunk) that yields one or more (key, value) tuple. chunk will be a str. The yielded key and value can be of any type, they will be passed to reduce_fn.
  • reduce_fn – a reduce function with signature (key, value) that yields a single (key, result) tuple