DatasetManager

class imagepypelines.core.ml.DatasetManager(k_folds=10, extensions=None, recursive=False, shuffle_seed=None)[source]

Bases: object

object to manage and automatically organize your dataset into training and testing chunks. This manager supports cross validation.

Parameters:
  • k_fold (int) – the number of folds to rotate the dataset through, default is 10
  • extensions (list,tuple,None) – the file extensions to filter filenames with if using the load_from_directories function. default is None
  • recursive (bool) – whether or not to recursively sort through directories if using the load_from_directories function. default is False
  • shuffle_seed (None) – seed to shuffle datums with. default is None
k_fold

the number of folds to rotate the dataset through,

Type:int
extensions

the file extensions to filter filenames with if using the load_from_directories function.

Type:list,tuple,None
recursive

whether or not to recursively sort through directories if using the load_from_directories function.

Type:bool
shuffle_seed

seed to shuffle datums with.

Type:None
fold_index

the fold this manager is currently on

Type:int
class_names

dictionary to containing the name of the classes, key is the integer label, value is the the class name

Type:dict
data_chunks

deque containing all the chunks for the data

Type:deque
label_chunks

deque containing all the chunks for the labels

Type:deque
printer

printer for this class, registered to ‘DatasetManager’

Type:ip.Printer
remaining_folds

number of remaining folds

Type:int

Attributes Summary

remaining_folds

Methods Summary

get_all() get the all the data for this fold
get_class_names(labels) retrieves names of the classes based on the labels
get_test() get the testing set for this fold
get_train() get the training set for this fold
load_from_arrays(*arrays, **class_names) load a list of class arrays and apply labels
load_from_directories(*directories) load a list of class directories and apply labels
rotate() rotate the data chunks so the next dataset fold is available

Attributes Documentation

remaining_folds

Methods Documentation

get_all()[source]

get the all the data for this fold

Parameters:None
Returns:list of all data labels(list): list of all labels
Return type:data(list)
get_class_names(labels)[source]

retrieves names of the classes based on the labels

get_test()[source]

get the testing set for this fold

Parameters:None
Returns:list of testing filenames test_labels(list): list of testing labels
Return type:test_data(list)
get_train()[source]

get the training set for this fold

Parameters:None
Returns:list of training filenames train_labels(list): list of training labels
Return type:train_data(list)
load_from_arrays(*arrays, **class_names)[source]

load a list of class arrays and apply labels

Parameters:
  • *arrays – unpacked list of data, each array must be for a different class so it can be labeled properly
  • class_names (list,tuple) – keyword-only argument to specify the name of each class
Returns:

self

load_from_directories(*directories)[source]

load a list of class directories and apply labels

Parameters:*directories – unpacked list of data directories, each directory must be for a different class so it can be labeled properly
Returns:self
rotate()[source]

rotate the data chunks so the next dataset fold is available