Class Documentation¶
-
mockr.
run_pandas_job
(input_data, map_fn, reduce_fn, n_chunks=4)¶ run_pandas_job
expects input to be a Pandas DataFrame.The rows of the data frame are equally divided into chunks and each chunk is sent to a separate map worker.Notes:
- you may wish to pre-shuffle your input_data
Parameters: - input_data (pandas.DataFrame) – pandas
DataFrame
to be processed. - map_fn – a function with signature
(chunk)
that mapsinput_data
to one or more(key, value)
tuples, which are emitted via yield.chunk
will be apandas.DataFrame
that is a row-wise section ofinput_data
. The yieldedkey
andvalue
can be of any type, they will be passed toreduce_fn
. reduce_fn: a function with signature(key, value)
that reduces one or morekey, value
pairs into a single(key, result)
tuple which is emitted via yield. - n_chunks (int) – The number of chunks to divide the
input_data
into.input_data
must be able to divide evenly into chunk size pieces.
-
mockr.
run_sequence_job
(input_data, map_fn, reduce_fn, n_chunks=None)¶ run_sequence_job
expectsinput_data
to be of typeCollections.abc.Sequence
e.g. Python List. Sequence Jobs provide two exection methods:- the sequence is divided into chunks and each chunk is sent to a separate map worker
- each item in the list is individually sent to a dedicated map worker
Notes:
- you may wish to pre-shuffle your input_data
Parameters: - input_data (Collections.abc.Sequence) – Sequence type holding data items e.g. Python
list
ofstr
. - map_fn – a map function with signature
(chunk)
that yields one or more(key, value)
tuple. Whenn_chunk = None
thenchunk
will be a single item ofinput_data
. Whenn_chunks = int
thenchunk
will be a sub-sequence ofinput_data
of lengthlen(input_data)/n_chunks
. - reduce_fn – a reduce function with signature
(key, value)
that yields a single(key, result)
tuple. - n_chunks (int) –
The number of chunks to divide the input_data into.
input_data
must be able to divide evenly into chunk size pieces. Seemap_fn
for more details anddefined behaviour.
-
mockr.
run_stream_job
(input_data, map_fn, reduce_fn)¶ run_stream_job
expects the input to be a string. Newline (“n”) characters delimit “chunks” of data and each line/chunk is sent to a separate map worker.Parameters: - input_data (str) – newline delimited string, each string is assigned to a map worker
- map_fn – a function with signature
(chunk)
that yields one or more(key, value)
tuple.chunk
will be astr
. The yieldedkey
andvalue
can be of any type, they will be passed toreduce_fn
. - reduce_fn – a reduce function with signature
(key, value)
that yields a single(key, result)
tuple