magmap.io.df_io module¶

Stats calculations and text output for MagellanMapper.

Attributes:

magmap.io.df_io.add_cols_df(df, cols)[source]¶

Add columns to a data frame.

Parameters:

df (pd.DataFrame) – Data frame.
cols (Dict[str, Any]) – Dictionary of {column: default_value} to add to df.

Returns:

Data frame with columns added.

Return type:

pd.DataFrame

magmap.io.df_io.append_cols(dfs, labels, fn_col=None, extra_cols=None, data_cols=None)[source]¶

Append columns from a group of data frames, optionally filtering to keep only columns matching criteria.

Appends columns based on simple concatenation. Typically used when each data frame contains identical samples and ordering. All columns will be kept from the first data frame.

:param dfs (List[pd.DataFrame]: Sequence of data frames. :type labels: :param labels: Sequence of strings corresponding to data frames

in dfs, where each string will be prepended to all column names from the given data frame.

Parameters:

fn_col (func) – Function by which to filter columns; defaults to None to keep all columns. Take precedence over data_cols.
extra_cols (List[str]) – List of additional columns to keep from the first data frame after filtering by fn_col; defaults to None.
data_cols (List[str]) – List of columns to keep from each data frame; defaults to None to keep all columns.

Returns:

The combined data frame.

Return type:

pd.DataFrame

See also

join_dfs(): Join data frames by column based on a specified: ID column.

magmap.io.df_io.coefvar_df(df, id_cols, metric_cols, size_col=None)[source]¶

Generate coefficient of variation for each metric within each group.

Parameters:

df – Pandas data frame.
id_cols – List of column names by which to group.
metric_cols – Sequence of metric column names.
size_col (default: None) – Name of size column, typically used for weighting; defaults to None.

Returns:

New data frame with metric_cols replaced by coefficient of variation and size_col replaced by mean.

magmap.io.df_io.combine_cols(df, combos)[source]¶

Combine columns within a single data frame with the aggregation function specified in each combination.

Parameters:

df – Pandas data frame.
combos – Tuple of combination column name and a nested tuple of the columns to combine as Enums.

Returns:

Data frame with the combinations each as a new column.

magmap.io.df_io.cond_to_cols_df(df, id_cols, cond_col, cond_base, metric_cols, sep='_')[source]¶

Transpose metric columns from rows within each condition group to separate sets of columns.

Parameters:

df – Pandas data frame.
id_cols – Sequence of columns to serve as index/indices.
cond_col – Name of the condition column.
cond_base – Name of first condition in output data frame; if None, defaults to first condition found.
metric_cols – Sequence of metric columns to normalize.
sep (str) – Separator for metric and condition in new column names.

Returns:

pd.DataFrame: New data frame with ``metric_cols` expanded to have separate columns for each condition in cond_cols.

magmap.io.df_io.data_frames_to_csv(data_frames, path=None, sort_cols=None, show=None, index=False)[source]¶

Combine and export multiple data frames to CSV file.

Parameters:

data_frames (Union[DataFrame, Sequence[DataFrame]]) – List of data frames to concatenate, or a single DataFrame.
path (Optional[str], default: None) – Output path; defaults to None, in which case the data frame will not be saved.
sort_cols (Union[str, List[str], None], default: None) – Column(s) by which to sort; defaults to None for no sorting.
show (Union[bool, str, None], default: None) – True or “ “ to print the data frame with a space-separated table, or can provide an alternate separator. Defaults to None to not print the data frame.
index (bool, default: False) – True to include the index; defaults to False.

Returns:

The combined data frame.

magmap.io.df_io.df_add(df0, df1, axis=1, fill_value=0)[source]¶

Wrapper function to add two Pandas data frames in a functional manner.

Parameters:

df0 (pd.DataFrame) – First data frame.
df1 (pd.DataFrame) – Second data frame.
axis (int) – Axis; defaults to 1.
fill_value (int) – Value with which to fill NaNs; defaults to 0.

Returns:

The difference from applying pd.DataFrame.subtract() from df0 to df1.

magmap.io.df_io.df_div(df0, df1, axis=1)[source]¶

Wrapper function to divide two Pandas data frames in a functional manner.

Parameters:

df0 (pd.DataFrame) – First data frame.
df1 (pd.DataFrame) – Second data frame.
axis (int) – Axis; defaults to 1.

Returns:

The quotient from applying pd.DataFrame.div() from df0 to df1.

magmap.io.df_io.df_subtract(df0, df1, axis=1, fill_value=0)[source]¶

Wrapper function to subtract two Pandas data frames in a functional manner.

Parameters:

df0 (pd.DataFrame) – First data frame.
df1 (pd.DataFrame) – Second data frame.
axis (int) – Axis; defaults to 1.
fill_value (int) – Value with which to fill NaNs; defaults to 0.

Returns:

The difference from applying pd.DataFrame.subtract() from df0 to df1.

magmap.io.df_io.dict_to_data_frame(to_import, path=None, sort_cols=None, show=None, records_cols=None)[source]¶

Import dictionary to data frame with additional options.

Supports conversion of Enum column names to their values. Also, allows import of data in record format, given as a list rather than as a dictionary. Additional options are supported through data_frames_to_csv().

Parameters:

to_import (Union[Dict, List[Sequence]]) – Dictionary to import. May also be list of sequences to import as records if records_cols is given. If column name are enums, they will be converted to their values.
path (Optional[str], default: None) – Output path to export data frame to CSV file; defaults to None for no export.
sort_cols (Union[str, List[str], None], default: None) – Column as a string or list of columns by which to sort; defaults to None for no sorting.
show (Union[bool, str, None], default: None) – True or “ “ to print the data frame with a space-separated table, or can provide an alternate separator. Defaults to None to not print the data frame.
records_cols (Union[list, tuple, None], default: None) – Import from records, where to_import is a list of rows rather than a dictionary, using this sequence of record column names instead of dictionary keys; defaults to None.

Return type:

DataFrame

Returns:

The imported data frame.

magmap.io.df_io.exps_by_regions(path, filter_zeros=True, sample_delim='-')[source]¶

Transform volumes by regions data frame to experiments-condition as columns and regions as rows.

Multiple measurements for each experiment-condition combination such measurements from separate sides of each sample will be summed. A separate data frame will be generated for each measurement.

Parameters:

path – Path to data frame generated from :func:regions_to_pandas or an aggregate of these data frames.
filter_zero – True to remove rows that contain only zeros.
sample_delim (default: '-') – Split samples column by this delimiter, taking only the first split element. Defaults to “-”; if None, will not split the samples.

Returns:

Dictionary of transformed dataframes with measurements as keys.

magmap.io.df_io.filter_dfs_on_vals(dfs, cols=None, row_matches=None)[source]¶

Filter data frames for rows matching a value for a given column and concatenate the filtered data frames.

Parameters:

dfs (List[pd.DataFrame]) – Sequence of data frames to filter.
cols (List[str]) – Sequence of columns to keep; defaults to None to keep all columns.
row_matches (List[Tuple]) – Sequence of (col, val) criteria corresponding to dfs, where only the rows with matching values to val for the given col will be kept. Defaults to None to keep all rows.

Returns:

Tuple of the concatenated filtered data frames and a list of the filtered data frames.

Return type:

Tuple[pd.DataFrame, List[pd.DataFrame]]

magmap.io.df_io.func_to_paired_cols(df, col1, col2, fn, name)[source]¶

Perform a function such as an arithmetic operation on a pair of columns.

Parameters:

df (pd.DataFrame) – Data frame, which will be modified in-place.
col1 (str) – Name of first column.
col2 (int) – Name of second column.
fn (func) – Function that takes the columns from col1 and col2 as separate arguments.
name (str) – Name of new column in df to insert the results from fn.

magmap.io.df_io.join_dfs(dfs, id_col, drop_dups=False, how=None)[source]¶

Join data frames by an ID column.

Parameters:

dfs (Sequence[DataFrame]) – Sequence of data frames to join.
id_col (Union[str, List[str]]) – Index column.
drop_dups (bool, default: False) – True to drop duplicates of id_col; defaults to False.
how (Optional[str], default: None) – How to join the data frames; if None (default), uses “left”.

Return type:

DataFrame

Returns:

Data frame after serially joining data frames.

magmap.io.df_io.main()[source]¶: Process stats based on command-line mode.

magmap.io.df_io.melt_cols(df, id_cols, cols_to_melt, var_name=None)[source]¶

Melt down a given set of columns to rows.

Parameters:

df – Pandas data frame.
id_cols – List of column names to treat as IDs.
cols_to_melt – List of column names to pivot into separate rows.
var_name (default: None) – Name of column with the melted column names; defaults to None to use the default name.

Returns:

Data frame with columns melted into rows.

magmap.io.df_io.merge_csvs(in_paths, out_path=None)[source]¶

Combine and export multiple CSV files to a single CSV file.

Parameters:

in_paths (list[str]) – List of paths to CSV files to import as data frames and concatenate.
out_path (str) – Output path; defaults to None.

Returns:

Merged data frame.

Return type:

pandas.DataFrame

magmap.io.df_io.merge_excels(paths, out_path, names=None)[source]¶

Merge Excel files into separate sheets of a single Excel output file.

Parameters:

paths (List[str]) – Sequence of paths to Excel files to load.
out_path (str) – Path to output file.
names (List[str]) – Sequence of sheet names corresponding to paths. If None, the filenames without extensions in paths will be used.

magmap.io.df_io.normalize_df(df, id_cols, cond_col, cond_base, metric_cols, extra_cols, df_base=None, fn=<function df_div>)[source]¶

Normalize columns from various conditions to the corresponding values in another condition.

Infinite values will be converted to NaNs.

Parameters:

df – Pandas data frame.
id_cols – Sequence of columns to serve as index/indices.
cond_col – Name of the condition column.
cond_base – Name of the condition to which all other conditions will be normalized. Ignored if df_base is given.
metric_cols – Sequence of metric columns to normalize.
extra_cols – Sequence of additional columns to include in the output data frame.
df_base (default: None) – Data frame to which values will be normalized. If given, cond_base will be ignored; defaults to None.
fn (default: <function df_div at 0x6ffe023a54c0>) – Function by which to normalize along axis 0; defaults to df_div().

Returns:

New data frame with columns from id_cols, cond_col, metric_cols, and extra_cols. Values with condition equal to cond_base should be definition be 1 or NaN, while all other conditions should be normalized to the original cond_base values.

magmap.io.df_io.pivot_with_conditions(df, index, columns, values, aggfunc='first')[source]¶

Pivot a data frame to columns with sub-columns for different conditions.

For example, a table of metric values for different regions within each sample under different conditions will be reorganized to region columns that are each split into condition sub-columns.

Parameters:

df (pandas.DataFrame) – Data frame to pivot.
index (Union[str, list[str]]) – Column name or list of names specifying the index for the output table.
columns (Union[str, list[str]]) – Name or list of names of columns whose values are pivoted into separate columns.
values (str) – Name of column whose values are moved into the new columns specified by columns.
aggfunc (func) – Aggregation function for duplicates; defaults to “first” to take the first value.

Returns:

The pivoted data frame and list of pivoted columns.

Return type:

pandas.DataFrame, list[str]

magmap.io.df_io.print_data_frame(df, sep=' ', index=False, header=True, show=True, **kwargs)[source]¶

Print formatted data frame.

Parameters:

df (DataFrame) – Data frame to print.
sep (str, default: ' ') – Separator for columns. True or “ “ to print the data frame with a space-separated table, or can provide an alternate separator. Defaults to “ “.
index (bool, default: False) – True to show index; defaults to False.
header (bool, default: True) – True to show header; defaulst to True.
show (bool, default: True) – True to print the formatted data frame; defaults to True.
**kwargs – Additional arguments to pandas.DataFrame.to_string() or pandas.DataFrame.to_csv().

Return type:

str

Returns:

The formatted data frame.

magmap.io.df_io.replace_vals(df, vals_from, vals_to, cols=None)[source]¶

Replace values in a data frame for the given columns.

Parameters:

df (pd.DataFrame) – Pandas data frame.
vals_from (Any) – Value or sequence of values to be replaced.
vals_to (Any) – Corresponding value or sequence of values to vals_from with which to replace.
cols (Union[str, List[str]]) – Column name or sequence of names to replace values; defaults to None to replace values in all columns.

Returns:

Data frame with values replaced.

Return type:

pd.DataFrame

magmap.io.df_io.weight_mean(vals, weights)[source]¶

Calculate the weighted arithmetic mean.

Parameters:

vals (List[float]) – Sequence of values, which can include NaNs.
weights (List[float]) – Sequence of weights.

Returns:

The weighted arithmetic mean of vals.

magmap.io.df_io.weight_std(vals, weights)[source]¶

Calculate the weighted standard deviation.

Parameters:

vals (List[float]) – Sequence of values, which can include NaNs.
weights (List[float]) – Sequence of weights.

Returns:

The weighted arithmetic standard deviation of vals.

magmap.io.df_io.zscore_df(df, group_col, metric_cols, extra_cols, replace_metrics=False)[source]¶

Generate z-scores for each metric within each group.

Parameters:

df – Pandas data frame.
group_col – Name of column specifying groups.
metric_cols – Sequence of metric column names.
extra_cols – Sequence of additional column names to include in the output data frame.
replace_metrics (default: False) – True to replace metric_cols with z-scores rather than adding new columns; defaults to False.

Returns:

New data frame with columns from extra_cols and z-scores in columns corresponding to metric_cols.