4.1.1.1.1.4. lib.data_handling.data_analysis#

Module that provides functionality in order to analyze an experiment. Includes functionality for pre-processing data and analyzing the data afterwards.

4.1.1.1.1.4.1. Module Contents#

4.1.1.1.1.4.1.1. Classes#

EvaluationFacade

Wrapper to provide convenient access to data evaluation.

Study

Class to cluster the analysis of an ensemble of experiment objects.

AnalysisRoutines

Class to handle various analysis routines.

Fit

Class that contains fit routine to estimate positions of molecules from experimental record. Every new fit needs the initialization of a new fit routine to clear results.

Estimators

Class that provides different estimators to evaluate the shape of the intensity minimum.

Results

MinfluxAnalysis

4.1.1.1.1.4.1.2. Functions#

study_single_file

Analyze a single file (experiments.yaml).

4.1.1.1.1.4.1.3. API#

lib.data_handling.data_analysis.study_single_file(full_file_path, methods=['QUAD'], agnostic=True, collect_artefacts=True)#

Analyze a single file (experiments.yaml).

This function analyzes an experiments.yaml file, applying different methods to study and collect data. The analysis results are then stored in a queue for later saving.

Parameters:
  • full_file_path (str) – The absolute path of the experiments.yaml file to be analyzed.

  • methods (list) – The list of method(s) to use for analysis. Default is [‘QUAD’].

  • agnostic (bool) – Flag to determine if the analysis should be agnostic. Default is True.

  • collect_artefacts (bool) – Flag to determine if the collected artefacts should be saved. Default is True.

Returns:

None. This function adds the analyzed result as a dictionary to a queue for later saving.

Return type:

None

Raises:

Exception – If the study of a specific type could not be performed. If the artefacts from the study could not be saved.

Example:

study_single_file(‘/path/to/experiments.yaml’, methods=[‘QUAD’], agnostic=True, collect_artefacts=True)

Note

The function prints diagnostic information to the console during execution.

Warning

This function may raise exceptions if specific study types fail or if artefacts cannot be saved.

See also

Study: The class used to perform the analysis.

class lib.data_handling.data_analysis.EvaluationFacade(collect_artefacts=True, max_files=1)#

Wrapper to provide convenient access to data evaluation.

Initialization

Initialize the EvaluationFacade object.

Parameters:
  • collect_artefacts (bool) – Flag indicating whether to collect artifacts. Default is True.

  • max_files (int) – Maximum number of files to consider for evaluation. Default is 1.

evaluate_data(in_folder, methods=['QUAD'], agnostic=True)#

Evaluate data in a given folder.

Parameters:
  • in_folder (str) – The path to the input folder.

  • methods (list) – The list of evaluation methods. Default is [‘QUAD’].

  • agnostic (bool) – Flag indicating whether the evaluation should be done agnostically. Default is True.

Returns:

None

Return type:

None

Raises:

Exception – If the evaluation of a specific file fails.

Example:

evaluation_facade = EvaluationFacade() evaluation_facade.evaluate_data(‘/path/to/data’, methods=[‘QUAD’], agnostic=True)

class lib.data_handling.data_analysis.Study(study_type='QUAD', collect_artefacts=True)#

Class to cluster the analysis of an ensemble of experiment objects.

Initialization

Initialize the Study object.

Parameters:
  • study_type (str) – The type of study to perform. Default is ‘QUAD’.

  • collect_artefacts (bool) – Flag indicating whether to collect artifacts. Default is True.

perform(experiments, agnostic=True)#

Perform an evaluation of the conducted experiment.

Parameters:
  • experiments (list) – List of experiment objects to evaluate.

  • agnostic (bool) – Whether to forget everything or employ prior knowledge (effect only on MLE).

Returns:

None

Return type:

None

Raises:

Exception – If the experiment object is invalid for non-agnostic analysis.

Example:

study = Study() study.perform(experiments_list, agnostic=True)

_get_metrics(experiments)#

Calculates metrics for several experiments to evaluate the success of the experiment, mainly in terms of photon numbers.

Parameters:

experiments (list) – List of experiments.

Returns:

None. Modifies self.results.

Example:

study = Study() study._get_metrics(experiments_list)

class lib.data_handling.data_analysis.AnalysisRoutines(agnostic=True, collect_artefacts=True)#

Class to handle various analysis routines.

Parameters:
  • agnostic (bool) – Flag indicating whether the analysis should be agnostic. Default is True.

  • collect_artefacts (bool) – Flag indicating whether to collect artifacts. Default is True.

Initialization

Initialize the AnalysisRoutines object.

Parameters:
  • agnostic (bool) – Flag indicating whether the analysis should be agnostic. Default is True.

  • collect_artefacts (bool) – Flag indicating whether to collect artifacts. Default is True.

MLE_analysis(experiments)#

Perform a Maximum Likelihood Estimation (MLE) on distance parameters.

This method estimates distance parameters using MLE based on the provided experiments. It calculates background estimates, kappa values, and performs fitting for different molecule counts.

Parameters:

experiments (list) – List of experiments for analysis.

Returns:

None. Modifies self.results and self.artefacts with analysis results.

Example:

routines = AnalysisRoutines() routines.MLE_analysis(experiments_list)

POLY_analysis(experiments, method)#

Perform NALM analysis of a sorted list of experiments, estimating distance via center of mass shift after bleaching steps. Done via quadratic approximation near the minimum, extracting the position of the minimum. Combined with an estimate from the quadratic estimator.

Parameters:
  • experiments (list) – Sorted list of experiments for analysis.

  • method (str) – The analysis method, one of [‘MIN-POLY’, ‘MIN-QUAD’, ‘MAX-QUAD’].

Returns:

None. Modifies self.results and self.artefacts with analysis results.

Example:

routines = AnalysisRoutines() routines.POLY_analysis(experiments_list, method=’MIN-POLY’)

HARMONIC_analysis(experiments, method)#

Perform harmonic analysis on a list of experiments.

Parameters:
  • experiments (list) – List of experiments for harmonic analysis.

  • method (str) – The harmonic analysis method, one of [‘CORR’, ‘FOURIER’, ‘HARMONIC’].

Returns:

None. Modifies self.results and self.artefacts with analysis results.

Example:

routines = AnalysisRoutines() routines.HARMONIC_analysis(experiments_list, method=’CORR’)

KAPPA_analysis(experiments, method, fixed_curvature=False)#

Perform NALM analysis of a sorted list of experiments, estimating distance via center of mass shift after bleaching steps. Done via quadratic approximation near the minimum, extracting the position of the minimum. Combined with an estimate from the quadratic estimator.

Parameters:
  • experiments (list) – Sorted list of experiments for analysis.

  • method (str) – The analysis method.

  • fixed_curvature (bool) – Flag indicating whether to use a fixed curvature value. Default is False.

Returns:

None. Modifies self.results with analysis results.

Example:

routines = AnalysisRoutines() routines.KAPPA_analysis(experiments_list, method=’your_method’, fixed_curvature=False)

WINDOW_analysis(experiments, estimator='harmonic')#

Use a sliding window of the data to perform an analysis on a list of experiments for different spatial areas around the minimum, i.e. extract photons from different regions of a full line scan.

Parameters:
  • experiments (list) – List of experiments for window analysis.

  • method (str) – The window analysis method.

Returns:

None. Modifies self.results and self.artefacts with analysis results.

Example:

routines = AnalysisRoutines() routines.WINDOW_analysis(experiments_list)

_get_fixed_curvature(exp, fit_dict)#

Obtain the average curvature of the minimum in the experiment.

Parameters:
  • exp – Experiment object for curvature estimation.

  • fit_dict (dict) – Fit dictionary containing the parameters for the fit.

Returns:

Array of curvatures, one for each axis.

_get_background_estimate(experiments)#

Estimate background from the 0M experiment and set the corresponding parameter in the parameter dictionary.

Parameters:

experiments (list) – List of experiments for background estimation.

Returns:

Numpy array representing the estimated background.

_get_kappa_estimate(experiments, fit_dict, n=1, kap0=None)#

Estimate the quality of the minimum from the 1M experiment and set the corresponding parameter in the parameter dictionary.

Parameters:
  • experiments (list) – List of experiments for kappa estimation.

  • fit_dict (dict) – Fit dictionary containing the parameters for the fit.

  • n (int) – Number of molecules in the experiment. Default is 1.

  • kap0 (numpy.ndarray) – Initial value for kappa. Default is None.

Returns:

Numpy array representing the estimated kappa.

class lib.data_handling.data_analysis.Fit(agnostic=True, collect_artefacts=True)#

Class that contains fit routine to estimate positions of molecules from experimental record. Every new fit needs the initialization of a new fit routine to clear results.

Initialization

Initialize the Fit class.

Parameters:
  • agnostic (bool) – Flag indicating whether the fit should be agnostic.

  • collect_artefacts (bool) – Flag indicating whether to collect artifacts during fit.

do_fit(exp, fit_dict, check_residuals=True)#

Method to fit experiments either line-wise or globally. Results are accessible via self.results.

Parameters:
  • exp (Experiment) – Experimental data.

  • fit_dict (dict) – Dictionary containing fit parameters.

  • check_residuals (bool) – Flag indicating whether to check residuals after fitting.

Returns:

Tuple containing solution dictionary and artifacts.

Return type:

tuple

_do_global_fit(experiment, param_dict={}, estimator=None)#

Fit one experiment globally (not line-wise). Suitable for 1M and 2M experiments.

Parameters:
  • experiment (Experiment) – Experimental data.

  • param_dict (dict) – Dictionary containing fit parameters.

  • estimator (str) – Estimation method.

Raises:

Exception – If no estimator is provided.

Returns:

Tuple containing solution dictionary and fit array.

Return type:

tuple

_do_local_fit(experiment, param_dict={}, estimator='quadratic', max_lines=np.inf)#

Find the molecule(s)’ position(s) of an experiment. Perform a line-wise fit by fitting each line of the record separately with MLE, quadratic, and FE.

Parameters:
  • experiment (Experiment) – The experiment to be fitted.

  • param_dict (dict) – Dictionary of parameter dictionaries for each line. {‘0’: {‘FWHM’: 1., …}, ‘1’: …}

  • estimator (str) – List of estimators to be applied.

  • max_lines (int) – Maximum number of lines to fit (default is infinity).

Returns:

Dictionary of fit results for each line.

Return type:

dict

_line_to_nan_mapping(dict, key, line, start, end, lines, block_size)#

Remove redundant information depending on axis of line.

Substitutes value of local fit by nan if it is not a fit of the corresponding axis.

Parameters:
  • dict (dict) – The dictionary containing fit information.

  • key (str) – The key corresponding to the axis.

  • line (int) – The line number.

  • start (int) – The start index.

  • end (int) – The end index.

  • lines (int) – The total number of lines.

  • block_size (int) – The block size.

Returns:

The modified dictionary.

Return type:

dict

check_residuals(exp, fit_arr, estimator, scope)#

Evaluate residuals with respect to the original full model of fit with optional visualization.

The estimate might have been obtained from a different model, e.g., quadratic or Fourier estimate.

Parameters:
  • exp (Experiment) – The experimental data.

  • fit_arr (np.ndarray) – The array containing fitted values.

  • estimator (str) – The estimation method used.

  • scope (str) – The scope of the fit (‘global’ or ‘local’).

Returns:

Chi2 value in each axis normalized via pixel number, i.e., average chi2/pixel of axis.

Return type:

list

get_MLE(exp, estimator, param_dict={}, show=False)#

Perform Maximum Likelihood Estimation (MLE) to obtain fit results for the given experiment.

Parameters:
  • exp (Experiment) – The experimental data.

  • estimator (str) – The estimator to be used for MLE.

  • param_dict (dict) – Dictionary of additional parameters for the estimation (default is an empty dictionary).

  • show (bool) – Flag indicating whether to display the fitting results (default is False).

Returns:

Dictionary containing the MLE fit results and the masked model.

Return type:

dict

get_taylor_estimate(exp, estimator, param_dict={}, show=False)#

Obtain fit results using Taylor series expansion-based estimation.

Parameters:
  • exp (Experiment) – The experimental data.

  • estimator (str) – The estimator to be used for Taylor series expansion.

  • param_dict (dict) – Dictionary of additional parameters for the estimation (default is an empty dictionary).

  • show (bool) – Flag indicating whether to display the fitting results (default is False).

Returns:

Dictionary containing the fit results and the masked model.

Return type:

dict

get_harmonic_estimate(exp, estimator, param_dict={}, show=False)#

Obtain fit results using harmonic estimation.

Parameters:
  • exp (Experiment) – The experimental data.

  • estimator (str) – The estimator to be used for harmonic estimation (‘fourier’, ‘correlate’, ‘harmonic’).

  • param_dict (dict) – Dictionary of additional parameters for the estimation (default is an empty dictionary).

  • show (bool) – Flag indicating whether to display the fitting results (default is False).

Returns:

Dictionary containing the fit results and the masked model.

Return type:

dict

_dict_to_line_mapping(line, lines, start, end, block_size, line_dict, glob_dict, key)#

Map values in each axis to the corresponding x or y line.

Parameters:
  • line (int) – Current line number.

  • lines (int) – Maximum number of lines.

  • start (int) – Size of the starting block.

  • end (int) – Size of the ending block.

  • block_size (int) – Block size.

  • line_dict (dict) – Current parameter dictionary for the fit in the respective axis/line.

  • glob_dict (dict) – Dictionary with two-dimensional key.

  • key (str) – Key for which values are assigned to the current axis/line.

Returns:

Updated dictionary with assigned values for the current axis/line.

Return type:

dict

class lib.data_handling.data_analysis.Estimators#

Class that provides different estimators to evaluate the shape of the intensity minimum.

Initialization

get_taylorE(xdata, ydata, estimator, param_dict={})#

Estimate parameters using the Taylor expansion of the full harmonic model.

Parameters:
  • xdata (array-like) – Independent variable data.

  • ydata (array-like) – Dependent variable data.

  • estimator (str) – The type of estimator (‘min-poly’, ‘min-quad’, ‘max-quad’).

  • param_dict (dict) – Dictionary of additional parameters for the estimation (default is an empty dictionary).

Returns:

Tuple containing the estimated parameters, success flag, and the fitted model.

Return type:

tuple

get_MLE(xdata, ydata, estimator, param_dict={})#

Maximum Likelihood Estimation (MLE) estimator for distance estimate.

Parameters:
  • xdata (array-like) – Independent variable data.

  • ydata (array-like) – Dependent variable data.

  • estimator (str) – The type of estimator (‘min-poly’, ‘min-quad’, ‘max-quad’, ‘harmonic’).

  • param_dict (dict) – Dictionary of additional parameters for the estimation (default is an empty dictionary).

Returns:

Tuple containing the estimated parameters and a success flag.

Return type:

tuple

get_CE(counts, show=False)#

Correlative estimator to determine phase shift and amplitude of harmonic signal.

This estimator is independent of the wavelength but assumes to process a full period of the signal. By correlating it with a harmonic signal in that period, it extracts the phase shift and amplitude that fit the signal best.

Parameters:
  • counts (array-like) – Array containing the signal data.

  • show (bool) – Flag to display plots of the original data, pure cosine, and residuals (default is False).

Returns:

Tuple containing the phase shift and amplitude.

Return type:

tuple

get_HE(xdata, ydata, param_dict={}, show=False)#

Get harmonic estimator, i.e. simple sinusoidal fit of phase scan.

Parameters:
  • xdata (array-like) – Array containing the phase data.

  • ydata (array-like) – Array containing the photon counts.

  • param_dict (dict) – Dictionary of additional parameters for the estimator (default is an empty dictionary).

  • show (bool) – Flag to display plots of the original data, pure cosine, and residuals (default is False).

Returns:

Tuple containing the solution vector and a success flag.

Return type:

tuple

get_FE(counts, L, K, show=False)#

Get 1D Fourier estimator (only for one full phase scan line!).

Call only on 1M and 2M experiments!

Parameters:
  • counts (array-like) – Array containing photon counts.

  • L (float) – Length of the signal.

  • K (int) – Number of samples in the signal.

  • show (bool) – Flag to display plots of the full FFT spectrum, cleaned signal, and residuals (default is False).

Returns:

Tuple containing the phase and amplitude of the selected frequency.

Return type:

tuple

min_hexa_model(x, params, **kwargs)#

Model based on a sixth-order polynomial.

Parameters:
  • x (array_like) – Input values.

  • params (tuple) – Model parameters (m0, m1, b).

  • kwargs (dict) – Additional keyword arguments.

Returns:

Model values.

Return type:

array_like

min_poly_model(x, params, **kwargs)#

Model based on a 4-th order polynomial.

Parameters:
  • x (array_like) – Input values.

  • params (tuple) – Model parameters (m0, m1, b).

  • kwargs (dict) – Additional keyword arguments.

Returns:

Model values.

Return type:

array_like

min_quad_model(x, params, **kwargs)#

Model based on a quadratic polynomial.

Parameters:
  • x (array_like) – Input values.

  • params (tuple) – Model parameters (m0, m1, b).

  • kwargs (dict) – Additional keyword arguments.

Returns:

Model values.

Return type:

array_like

max_quad_model(x, params, **kwargs)#

Model based on a quadratic polynomial to fit a maximum.

Parameters:
  • x (array_like) – Input values.

  • params (tuple) – Model parameters (m0, m1, b).

  • kwargs (dict) – Additional keyword arguments.

Returns:

Model values.

Return type:

array_like

harmonic_model(x, params, **kwargs)#

Harmonic model.

Parameters:
  • x (array_like) – Input values.

  • params (tuple) – Model parameters (m0, m1, b).

  • kwargs (dict) – Additional keyword arguments.

Returns:

Model values.

Return type:

array_like

lsqs_objective(x, y, model, params, **kwargs)#

Least squares objective function.

Parameters:
  • x (array_like) – Input values.

  • y (array_like) – Target values.

  • model (callable) – Model function.

  • params (array_like) – Model parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Objective value.

Return type:

float

loglike_objective(x, y, model, params, **kwargs)#

Log-likelihood objective function.

Parameters:
  • x (array_like) – Input values.

  • y (array_like) – Target values.

  • model (callable) – Model function.

  • params (array_like) – Model parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Objective value.

Return type:

float

objective_jac(objective, params)#

Compute the Jacobian of an objective function.

Parameters:
  • objective (callable) – Objective function.

  • params (array_like) – Model parameters.

Returns:

Jacobian matrix.

Return type:

array_like

objective_hess(x, y, objective, params, **kwargs)#

Compute the Hessian matrix of an objective function.

Parameters:
  • x (array_like) – Input values.

  • y (array_like) – Target values.

  • objective (callable) – Objective function.

  • params (array_like) – Model parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Hessian matrix.

Return type:

array_like

get_initial_guess(x, y, estimator, params0=None, constr=None, **kwargs)#

Retrieve an initial guess for model parameters.

Parameters:
  • x (array_like) – Input values.

  • y (array_like) – Target values.

  • estimator (str) – Estimation method.

  • params0 (array_like, optional) – Initial guess for parameters.

  • constr (LinearConstraint, optional) – Constraints on parameters.

  • kwargs (dict) – Additional keyword arguments.

Returns:

Initial guess for parameters.

Return type:

array_like

_estimate_initial_values(x, y)#

Estimate initial values for model parameters.

Parameters:
  • x (array_like) – Input values.

  • y (array_like) – Target values.

Returns:

Initial values for parameters.

Return type:

array_like

_find_ext(estimator, p, w=None)#

Find the extremum in 1D via Kernel Density Estimation (KDE) and differential evolution.

Parameters:
  • estimator (str) – Estimator to decide whether to find a maximum or minimum.

  • p (array_like) – Array of positions.

  • w (array_like, optional) – Photons used as weights for the KDE.

Returns:

Position of the maximum density.

Return type:

array_like

class lib.data_handling.data_analysis.Results#

Initialization

append(obj)#

Merge two objects by appending their attributes.

Attributes have to be lists!

Parameters:

obj (object) – The object to be appended.

Returns:

None

class lib.data_handling.data_analysis.MinfluxAnalysis#

Initialization

fit_chunk(input_df, estimator='min-quad', plot=False, output=None, **kwargs)#

Fit a chunk of data using the specified estimator.

Parameters:
  • input_df (pd.DataFrame) – Input DataFrame containing ‘photons’, ‘pos’, ‘weights’, and ‘time’ columns.

  • estimator (str) – Estimation method (default is ‘min-quad’).

  • plot (bool) – Flag indicating whether to plot the fit results (default is False).

  • output (str or None) – Output file or path for the plot (default is None).

  • kwargs (dict) – Additional keyword arguments to be passed to the estimator.

Returns:

Series containing fit results.

Return type:

pd.Series

_plot_fit(df, fit_vals, output=None)#

Plot the fit and residuals of the given DataFrame.

Parameters:
  • df (pd.DataFrame) – DataFrame containing ‘photons’, ‘pos’, ‘weights’, ‘fit’, and ‘residuals’ columns.

  • fit_vals (np.ndarray) – Fit values obtained from the fitting procedure.

  • output (str or None) – Output file or path for saving the plot.

assign_chunk_id(df, mode='tuple', chunk_size=50, max_chunks=10, overlap=0.0, bin_size=50, **kwargs)#

Assign chunk IDs to the given DataFrame based on the specified mode and chunking parameters.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing relevant data.

  • mode (str, optional) – Chunking mode (‘tuple’ or ‘photons’).

  • chunk_size (int, optional) – Size of each chunk.

  • max_chunks (int, optional) – Maximum number of chunks.

  • overlap (float, optional) – Overlap percentage between chunks.

  • bin_size (int, optional) – Bin size for assigning bin IDs to chunks.

  • kwargs – Additional keyword arguments.

Returns:

DataFrame with assigned chunk IDs.

Return type:

pd.DataFrame

grid_data(df, num_points=100)#

Grid the given DataFrame to interpolate and create a new DataFrame with a specified number of points.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing ‘pos’ and ‘photons’ columns.

  • num_points (int, optional) – Number of points for the new grid.

Returns:

Gridded DataFrame with interpolated ‘photons’ values.

Return type:

pd.DataFrame