Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Constraints

class Constraits(debug=False, logging=True, logger=None) Module for building customised constraints for data cleaning. Designed to work with the CleanData class.

Parameters

debug: boolean, default False. Whether to print debug-related outputs to console.

logging: boolean, default True. Whether to generate logfile.

logger: object, default is None. If None, and logging==True, logs will be printed to the constraints_logfile.txt file. Alternatively, one can supply the CleanData.logger object for continuous logging.

Notes

The Constraints class contains methods which are commonly used as building blocks for more complicated constraints. The methods support automatic logging. One way to use these methods to modify a variable is to write a wrapper function for variable variableOne, and call it in a script.py:

def con_variableOne(df, con=con)
    # use a method in con
    df = con.method(df,[ ...] )

    return df, con

df = cd.clean_df
con = Constraints(debug=True, logging=True, logger=cd.logger)
df, con = con_variable(df, con)

<!-- Check details of constrained dataset -->
pprint.pprint(con.log)

<!-- Output log variable to file -->
con.output_log_to_file()

<!-- Update CleanData with new DF -->
cd.update_data(new_df = df, filename_suffix = cd.suffix_constraints)

Examples

Please refer to the below pages for detailed examples:

Example Description
CleanData 3 Demonstrates use of customised constraints

Attributes

Attribute Description
debug (boolean) whether to debug or not
logging (boolean) whether to log or not
logger (obj) logger used for logging
log (dict) dictionary that records things done to a variable

Methods

Method Description
output_log_to_file() Output the collected information in Constraints.log to log file.
multiparent_conditions(df, var_array, dict_conditions_values) Function for replacement of values in a dataframe based on multiple conditions evaluated from multiple columns.
evaluate_df_column(df, column_names, [dict_conditions_values, func, output_column_name]) This function takes a dataframe and column name(s) and evaluates the column based on the given conditions and values, creating a new column in the dataframe with the evaluated values. Optionally, a function can be passed in to evaluate the column.
convertBlankstoValue(df, [var_array, value]) This function is used to convert missing values in a dataframe column to a specified value.
compare_columns_A_B(df, A, B) A function to compare two columns A and B of a dataframe and modify column A to B’s value if A is greater than B

Table of contents


Copyright © 2023 BiomedDAR. Distributed by an MIT license.