4.1.1.1.1.5. lib.data_handling.mf_parser
#
Module that provides functionality in order to parse minflux data.
4.1.1.1.1.5.1. Module Contents#
4.1.1.1.1.5.1.1. Classes#
Facade class for parsing Minflux files in a specified directory. |
|
Parser class for Minflux data files. |
|
Class for filtering and processing Minflux data frames. |
|
Facade class for processing Minflux data using the MinfluxProcessor. |
|
Class for processing Minflux data. |
|
4.1.1.1.1.5.1.2. Functions#
Parse a single file using the specified parser. |
|
Filter a single Minflux file. |
|
Load a processed file. |
|
Process a group. |
|
Process a batch of Minflux data. |
4.1.1.1.1.5.1.3. API#
- lib.data_handling.mf_parser.parse_single_file(path, base_in=None, base_out=None, parser=None)#
Parse a single file using the specified parser.
This function reads a file, transforms its contents using a parser, and saves the resulting DataFrame as a CSV file.
- Parameters:
path (str) – The path to the file to be parsed.
base (str, optional) – The base input directory (unused in the current implementation).
base_out (str, optional) – The base output directory where the parsed CSV file will be saved.
parser (Parser, optional) – The parser object used to read and transform the file.
- class lib.data_handling.mf_parser.MF_ParserFacade(type=None)#
Facade class for parsing Minflux files in a specified directory.
This class provides a high-level interface for parsing Minflux files in a given directory using a MinfluxParser.
- Parameters:
type (str, optional) – The type of Minflux parser to be used.
Initialization
Initialize the Minflux Parser Facade.
- Parameters:
type (str, optional) – The type of Minflux parser to be used.
- parse_directory(input, output, max_files=3)#
Parse Minflux files in the specified directory.
This method parses Minflux files in the input directory using a MinfluxParser. It performs multithreaded parsing of files and then multi-threaded post-processing on the list of files.
- Parameters:
input (str) – The input directory containing Minflux files to be parsed.
output (str) – The output directory where parsed CSV files will be saved.
max_files (int, optional) – The maximum number of files to be processed.
- class lib.data_handling.mf_parser.MinfluxParser(type=None)#
Parser class for Minflux data files.
This class provides methods to find, load, transform, and save Minflux data files of different types.
- Parameters:
type (str, optional) – The type of Minflux parser to be used.
Initialization
Initialize the Minflux Parser.
- Parameters:
type (str, optional) – The type of Minflux parser to be used.
- Raises:
Exception – If the type of Minflux parser is not specified.
- _find_files(input_dir, max_files=10)#
Find Minflux data files in the specified directory.
This method finds all available Minflux data files based on the specified type.
- Parameters:
input_dir (str) – The input directory containing Minflux data files.
max_files (int, optional) – The maximum number of files to be processed.
- Returns:
List of Minflux data files.
- Return type:
List[str]
- _load_as_frame(path)#
Load Minflux data file as a DataFrame.
This method reads the content of a Minflux data file and returns it as a DataFrame.
- Parameters:
path (str) – The path to the Minflux data file.
- Returns:
DataFrame containing the Minflux data.
- Return type:
pandas.DataFrame
- _transform_frame(df)#
Transform the Minflux data frame.
This method transforms the Minflux data frame to return position and counts.
- Parameters:
df (pandas.DataFrame) – The original Minflux data frame.
- Returns:
Transformed Minflux data frame.
- Return type:
pandas.DataFrame
- _save_frame(df, path, base_in, base_out)#
Save the Minflux data frame.
This method labels and saves the Minflux data frame as a pickle file.
- Parameters:
df (pandas.DataFrame) – The Minflux data frame to be saved.
path (str) – The path to the original Minflux data file.
base (str) – The base input directory.
base_out (str) – The base output directory.
- _transform_otto(df)#
Transform the Minflux Otto data frame.
This method selects the last iteration, extracts position and photon count values, and creates a new DataFrame with merged values for both X and Y axes.
- Parameters:
df (pandas.DataFrame) – The original Minflux Otto data frame.
- Returns:
Transformed Minflux data frame.
- Return type:
pandas.DataFrame
- _transform_philip(df)#
Transform the Minflux Philip data frame.
This method creates a new DataFrame with merged values for both X and Y axes.
- Parameters:
df (pandas.DataFrame) – The original Minflux Philip data frame.
- Returns:
Transformed Minflux data frame.
- Return type:
pandas.DataFrame
- _transform_thomas(df)#
Transform the Minflux Thomas data frame.
This method creates a new DataFrame with merged values for both X and Y axes and additional parameters.
- Parameters:
df (pandas.DataFrame) – The original Minflux Thomas data frame.
- Returns:
Transformed Minflux data frame.
- Return type:
pandas.DataFrame
- _read_philip(file_path)#
Read Minflux Philip data from a JSON file.
- Parameters:
file_path (str) – Path to the Minflux Philip JSON file.
- Returns:
Processed Minflux Philip data frame.
- Return type:
pandas.DataFrame
- _read_otto(file_path)#
Read Minflux Otto data from a text file.
- Parameters:
file_path (str) – Path to the Minflux Otto text file.
- Returns:
Processed Minflux Otto data frame.
- Return type:
pandas.DataFrame
- _read_thomas(path)#
Read Minflux Thomas data from a YAML file.
- Parameters:
path (str) – Path to the Minflux Thomas YAML file.
- Returns:
Processed Minflux Thomas data frame.
- Return type:
pandas.DataFrame
- lib.data_handling.mf_parser.filter_single_minflux_file(path, base_in=None, base_out=None, bin_size=None)#
Filter a single Minflux file.
- Parameters:
path (str) – Path to the Minflux file.
base (str, optional) – Base input directory, default is None.
base_out (str, optional) – Base output directory, default is None.
bin_size (int, optional) – Size of bins for filtering, default is None.
- Returns:
DataFrame with mean Minflux data.
- Return type:
pandas.DataFrame
- class lib.data_handling.mf_parser.MinfluxFilterFacade#
Initialization
- filter_directory(input, output, max_files=10)#
Filter files in the input directory and save filtered results.
- Parameters:
input (str) – Input directory path.
output (str) – Output directory path.
max_files (int, optional) – Maximum number of files to process, default is 10.
- post_filtering(input, output)#
Post-process filtered Minflux data and infer state means. Based on the clear identification of states via traces with bleaching steps, we now proceed to label other traces with the knowledge of the state means.
- Parameters:
input (str) – Input directory path.
output (str) – Output directory path.
- class lib.data_handling.mf_parser.MinfluxFilter#
Class for filtering and processing Minflux data frames.
Initialization
Initialize MinfluxFilter object.
- segment_frame(df)#
Segment the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Tuple containing filtered and segmented data frames.
- Return type:
tuple(pandas.DataFrame, pandas.DataFrame)
- filter_frame(segmented_df, bin_size=None)#
Filter the segmented data frame.
- Parameters:
segmented_df (pandas.DataFrame) – Segmented data frame.
bin_size (float, optional) – Size of bins for spatial binning, default is None.
- Returns:
Binned data frame.
- Return type:
pandas.DataFrame
- _state_id_pre_assignment(df)#
Pre-assign state IDs based on mean values.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Data frame with pre-assigned state IDs.
- Return type:
pandas.DataFrame
- bin_frame(df, bin_size=1)#
Bin the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
bin_size (float, optional) – Size of bins, default is 1.
- Returns:
Reduced and binned data frame.
- Return type:
pandas.DataFrame
- remove_tuple_outliers(group, column_name='photons', threshold=1.5)#
Remove outliers within each bin using the IQR method.
- Parameters:
group (pandas.DataFrame) – Grouped data frame.
column_name (str, optional) – Name of the column to filter, default is ‘photons’.
threshold (float, optional) – IQR threshold for outlier removal, default is 1.5.
- Returns:
Data frame with outliers removed.
- Return type:
pandas.DataFrame
- _save_frame(filtered_df, segmented_df, path, base_in, base_out)#
Save filtered and segmented data frames.
- Parameters:
filtered_df (pandas.DataFrame) – Filtered data frame.
segmented_df (pandas.DataFrame) – Segmented data frame.
path (str) – File path.
base (str) – Base input directory.
base_out (str) – Base output directory.
- _segment_frame(df)#
Segment the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Segmented data frame.
- Return type:
pandas.DataFrame
- _find_bkps(df)#
Find breakpoints in the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Data frame with segment IDs.
- Return type:
pandas.DataFrame
- _find_state_clusters(df)#
Find state clusters in the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Clustered data frame.
- Return type:
pandas.DataFrame
- _map_clusters_on_states(df)#
Map clusters on states in the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Data frame with mapped clusters.
- Return type:
pandas.DataFrame
- _check_cluster_number(df)#
Check the number of clusters in the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Series with validity information.
- Return type:
pandas.Series
- _label_cluster(df)#
Label clusters in the input data frame.
- Parameters:
df (pandas.DataFrame) – Input data frame.
- Returns:
Data frame with labeled clusters.
- Return type:
pandas.DataFrame
- class lib.data_handling.mf_parser.MinfluxProcessorFacade#
Facade class for processing Minflux data using the MinfluxProcessor.
This class provides a high-level interface for processing multiple Minflux data files.
- Variables:
global_filter_results (pandas.DataFrame) – Global filter results data frame.
base (str) – Base input directory.
base_out (str) – Base output directory.
bootstrap_dicts (list) – List of bootstrap dictionaries.
Initialization
- process_minflux(input, output, max_files=2, key='', bootstrap_dicts=[])#
Process multiple Minflux data files.
This method processes multiple Minflux data files using parallel processing. It relies on the MinfluxProcessor class for individual file processing.
- Parameters:
input (str) – Input directory containing Minflux data files.
output (str) – Output directory for processed files.
max_files (int, optional) – Maximum number of files to process, default is 2.
key (str, optional) – A key to filter files during processing, e.g. to select a certain batch. Default is an empty string.
bootstrap_dicts (list, optional) – List of bootstrap dictionaries, default is an empty list.
- class lib.data_handling.mf_parser.MinfluxProcessor#
Class for processing Minflux data.
This class provides methods for processing individual Minflux data files and performing local analysis.
- Variables:
bootstrap_dicts (list) – List of bootstrap dictionaries.
Initialization
- process_single_file(path, global_filter_results=None, base_in=None, base_out=None, bootstrap_dicts=[])#
Process a single Minflux data file.
This method processes a single Minflux data file, performs global filtering based on provided results, and conducts local analysis using bootstrap dictionaries.
- Parameters:
path (str) – Path to the Minflux data file.
global_filter_results (pandas.DataFrame, optional) – Global filter results data frame.
base (str, optional) – Base input directory.
base_out (str, optional) – Base output directory.
bootstrap_dicts (list, optional) – List of bootstrap dictionaries.
- Returns:
Local analysis results data frame.
- Return type:
pandas.DataFrame
- lib.data_handling.mf_parser._load_processed_file(file)#
Load a processed file.
This function loads a processed file and categorizes it as ‘global’ or ‘local’.
- Parameters:
file (str) – Path to the processed file.
- Returns:
Tuple containing the category (‘global’ or ‘local’) and the loaded DataFrame.
- Return type:
tuple
- lib.data_handling.mf_parser.process_group(group)#
Process a group.
This function processes a group by calculating the Euclidean norm of the ‘d’ column within the group.
- Parameters:
group (pandas.DataFrame) – DataFrame group to be processed.
- Returns:
Processed DataFrame group.
- Return type:
pandas.DataFrame
- class lib.data_handling.mf_parser.MinfluxPostProcessing#
Initialization
- post_process_minflux(input_dir, output_dir, visualize=True)#
Perform post-processing on Minflux data.
This method performs post-processing on Minflux data, including filtering, calibration, and visualization.
- Parameters:
input_dir (str) – Input directory containing processed Minflux data.
output_dir (str) – Output directory for post-processed data and visualizations.
visualize (bool, optional) – Flag to enable or disable visualization.
- _load_processed_files(processable_files)#
Load processed files.
This method loads processed files in parallel and categorizes them as ‘global’ or ‘local’.
- Parameters:
processable_files (list) – List of processed file paths.
- Returns:
Tuple containing global and local results DataFrames.
- Return type:
tuple
- lib.data_handling.mf_parser.process_batch(base_in, parse=True, parser_type='otto', filter=True, postfilter=True, process=True, processing_key='', bootstrap_dicts={}, postprocess=True, max_files=np.inf, visualize=True)#
Process a batch of Minflux data.
This function orchestrates the entire process of parsing, filtering, processing, and post-processing a batch of Minflux data.
- Parameters:
base (str) – Base input directory containing Minflux data.
parse (bool, optional) – Flag to enable or disable parsing.
parser_type (str, optional) – Type of Minflux parser to use.
filter (bool, optional) – Flag to enable or disable filtering.
postfilter (bool, optional) – Flag to enable or disable post-filtering.
process (bool, optional) – Flag to enable or disable processing.
bootstrap_dicts (dict or list, optional) – Dictionary or list of dictionaries containing bootstrap parameters.
postprocess (bool, optional) – Flag to enable or disable post-processing.
max_files (int, optional) – Maximum number of files to process.
visualize (bool, optional) – Flag to enable or disable visualization during post-processing.
- Returns:
Path to the output directory containing post-processed data.
- Return type:
str