Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Definitions (List of Global Variables)

Attribute Description Type
PREFIX_PATH Defines the root directory path Directory
RAW_PATH Define folder name for all raw data files Directory
TRAIN_PATH Define folder name for all cleaned data files. If not specified, default is "trainData" Directory
READ_NA Option for loading CSVs. If False, data entries that can be found in default nanList will be converted to NaN. If True, above entries will be preserved as they are. If not specified, default is False Data loading
RAWXLSX Defines filename containing the raw data. E.g. “xx.xlsx”, “xx.csv” Data loading
RAWXLSX_SHEETNAME if RAWXLSX is an excel file, assign the sheetname from which to load the data. If not specified, will read the first sheet. E.g. “Sheet1” Data loading
RAWDICTXLSX Defines filename containing the data dictionary. E.g. “xx.xlsx”, “xx.csv” Data loading
RAWDICTXLSX_SHEETNAME if RAWDICTXLSX is an excel file, assign the sheetname from which to load the dictionary. If not specified, will read the first sheet. E.g. “Sheet1” Data loading
LOGGING Option to output logfile. If True, logfile will be built. If not specified, default is True. Log
LOG_FILENAME Defines filename of logfile. If not defined, default is logfile.txt. Log
CREATE_UNIQUE_INDEX Option to create unique row index from existing columns. If True, new index will be created. If not specified, default is False. Indexing
UNIQUE_INDEX_COMPOSITION_LIST List of column names to create new index from. If not specified, default is []. E.g.: ["subject_id", "visit"] Indexing
UNIQUE_INDEX_DELIMITER Delimiter to separate values from composition list. If not specified, default is _ Indexing
LONG_VAR_MARKER Defines the variable name that indicates which longitudinal group that row belongs to. If not specified, default is None. Longitudinal data
DICT_VAR_VARNAME Column in data dictionary containing variable names in input data. If not specified, set as “NAME”. Data Dictionary settings
DICT_VAR_VARCATEGORY Column in data dictionary setting the category of the variable name. If not specified, set as “CATEGORY Data Dictionary settings
DICT_VAR_TYPE Column in data dictionary setting the type of variable (string, numeric, date). If not specified, set as “TYPE Data Dictionary settings
DICT_VAR_CODINGS Column in data dictionary setting the codings of variable (dateformat, categories). If not specified, set as “CODINGS Data Dictionary settings
VAR_NAME_STRIPEMPTYSPACES Boolean option. If True, empty spaces will be stripped from variable names in input data, and from variables names listed in data dictionary. If not specified, default is False. Data cleaning settings
OUTPUT_TYPE_DATA The output file type for the clean data files. Available options: ‘csv’, ‘xlsx’. If not specified, default is ‘csv Data cleaning settings
OUTPUT_TYPE_DICT The output file type fot the amended dictionary. Available options: ‘csv’, ‘xlsx’. If not specified, default is ‘xlsx Data cleaning settings
SUFFIX_DROPPED_DUPLICATED_ROWS The filename suffix to use for intermediate outputs of cleaned data. If not specified, default is DD. Drop Duplicates settings
OUTPUT_DROPPED_DUPLICATED_ROWS_FILENAME The output filename to store the duplicated rows which have been dropped. Default is rowsRemoved.xlsx Drop Duplicates settings
SUFFIX_CONSTRAINTS The filename suffix to use for intermediate outputs of cleaned data. If not specified, default is CON. Constraints settings
SUFFIX_STANDARDISE_TEXT The filename suffix to use for intermediate outputs of cleaned data. If not specified, default is ST. Standardise Text settings
OPTIONS_STANDARDISE_TEXT_CASE_TYPE The default case type to convert strings into: “uppercase”, “lowercase”, “capitalise”. If not specified, default is uppercase. Standardise Text settings
OPTIONS_STANDARDISE_TEXT_EXCLUDE_LIST The variables to exclude from the conversion. For example: ["Gender", "Work"]. Standardise Text settings
OPTIONS_STANDARDISE_TEXT_CASE_TYPE_DICT The dictionary to customise case_type for specific variables, overwriting default. For example: {"Race1": "capitalise"}. Standardise Text settings
SUFFIX_STANDARDISE_DATE The filename suffix to use for intermediate outputs of cleaned data. If not specified, default is DATE. Date standardisation settings
OPTIONS_STANDARDISE_DATE_FORMAT The standard date format to use for all dates (if not specified, default is yyyy-mm-dd). Follows format used in ms-excel, see ref. Example: ddd, dd mmmm yy. Date standardisation settings
OPTIONS_FAILEDDATE_CONVERSIONS_FILENAME The filename for storing list of failed date conversions (only csv). Default is failed_date_conversions.csv. Date standardisation settings
SUFFIX_CONVERT_ASCII The filename suffix to use for intermediate outputs of cleaned data. If not specified, default is ASCII. ASCII Conversion settings
OPTIONS_CONVERT_ASCII_EXCLUSION_LIST List of characters to exclude from conversion. Eg. ['€','$','Ò']. ASCII Conversion settings

Copyright © 2023 BiomedDAR. Distributed by an MIT license.