Class Documentation¶
-
mockr.run_pandas_job(input_data, map_fn, reduce_fn, n_chunks=4)¶ run_pandas_jobexpects input to be a Pandas DataFrame.The rows of the data frame are equally divided into chunks and each chunk is sent to a separate map worker.Notes:
- you may wish to pre-shuffle your input_data
Parameters: - input_data (pandas.DataFrame) – pandas
DataFrameto be processed. - map_fn – a function with signature
(chunk)that mapsinput_datato one or more(key, value)tuples, which are emitted via yield.chunkwill be apandas.DataFramethat is a row-wise section ofinput_data. The yieldedkeyandvaluecan be of any type, they will be passed toreduce_fn. reduce_fn: a function with signature(key, value)that reduces one or morekey, valuepairs into a single(key, result)tuple which is emitted via yield. - n_chunks (int) – The number of chunks to divide the
input_datainto.input_datamust be able to divide evenly into chunk size pieces.
-
mockr.run_sequence_job(input_data, map_fn, reduce_fn, n_chunks=None)¶ run_sequence_jobexpectsinput_datato be of typeCollections.abc.Sequencee.g. Python List. Sequence Jobs provide two exection methods:- the sequence is divided into chunks and each chunk is sent to a separate map worker
- each item in the list is individually sent to a dedicated map worker
Notes:
- you may wish to pre-shuffle your input_data
Parameters: - input_data (Collections.abc.Sequence) – Sequence type holding data items e.g. Python
listofstr. - map_fn – a map function with signature
(chunk)that yields one or more(key, value)tuple. Whenn_chunk = Nonethenchunkwill be a single item ofinput_data. Whenn_chunks = intthenchunkwill be a sub-sequence ofinput_dataof lengthlen(input_data)/n_chunks. - reduce_fn – a reduce function with signature
(key, value)that yields a single(key, result)tuple. - n_chunks (int) –
The number of chunks to divide the input_data into.
input_datamust be able to divide evenly into chunk size pieces. Seemap_fnfor more details anddefined behaviour.
-
mockr.run_stream_job(input_data, map_fn, reduce_fn)¶ run_stream_jobexpects the input to be a string. Newline (“n”) characters delimit “chunks” of data and each line/chunk is sent to a separate map worker.Parameters: - input_data (str) – newline delimited string, each string is assigned to a map worker
- map_fn – a function with signature
(chunk)that yields one or more(key, value)tuple.chunkwill be astr. The yieldedkeyandvaluecan be of any type, they will be passed toreduce_fn. - reduce_fn – a reduce function with signature
(key, value)that yields a single(key, result)tuple