DictReader

class fuzzyfields.DictReader(iterable: Iterable, fields: Dict[str, fuzzyfields.fuzzyfield.FuzzyField] = None, *, errors: Union[str, Callable[Exception, Any]] = None, name_map: Dict[str, str] = None)

Generic iterable that acquires an iterable of dicts in input, e.g. csv.DictReader, and for every input line it yields a line that is filtered, validated and processed depending on the input parameters.

Parameters
  • iterable – an iterable object, e.g. csv.DictReader, that yields dicts of {field : value}.

  • fields – dict of instance-specific FuzzyField objects. You should not use this parameter to set any fields that are known at the time of writing the code, which is the most common use case. Instead, you should create a subclass of DictReader and override the DictReader.fields class attribute.

  • errors

    One of:

    ’raise’ (default)

    raise a ValidationError on the first line

    ’critical’, ‘error’, ‘warning’, ‘info’, ‘debug’

    log the error with the matching functions in logging and continue

    callable(ValidationError)

    invoke a custom callable and continue (unless it itself raises an Exception)

    In case errors != ‘raise’ and a FuzzyField raises an exception,

    • if the field is required, the entire line is discarded

    • otherwise, the field is replaced with its default value

    Alternatively to passing this parameter, you may create a subclass of DictReader and override the DictReader.errors class attribute.

  • name_map (dict) –

    optional dict of {from name: to name} renames, where each pair performs a key replacement.

    Alternatively to passing this parameter, you may create a subclass of DictReader and override the DictReader.name_map class attribute.

__init__(iterable: Iterable, fields: Dict[str, fuzzyfields.fuzzyfield.FuzzyField] = None, *, errors: Union[str, Callable[Exception, Any]] = None, name_map: Dict[str, str] = None)

Build new object

classmethod __init_subclass__()

Executed after all subclasses of the current class are defined. Set FuzzyField.name and enrich the docstring of the subclass with the documentation of the fields.

__iter__()

Draw dicts from the underlying iterable and yield dicts of {field name : parsed value}.

__weakref__

list of weak references to the object (if defined)

errors = 'raise'

Class level error handling system. Can be overridden with an instance-specific value through the matching __init__ parameter.

fields = {}

Class-level map of {field name: FuzzyField}. Overriding this dict is the preferential way to add fields, as they will dynamically build Sphinx documentation. You may add instance-specific fields with the matching __init__ parameter. Override with a OrderedDict if you need the fields to be parsed in order (this is generally only necessary when one field defines the domain of another).

property line_num

Return line number of underlying file.

Raises

AttributeError – if the underlying iterator is not a csv.reader(), csv.DictReader, or another duck-type compatible class

name_map = {}

Class-level map of field renames. The keys in this dict must be a subset of the keys in the fields dict. You can add to this dict in an instance-specific way by setting the matching __init__ parameter.

postprocess_row(row: Dict[str, Any]) → Dict[str, Any]

Give child classes an opportunity to post-process every row after it’s been parsed by the FuzzyFields. This allows handling special cases and performing cross-field validation.

Parameters

row – The row as composed by the fields, after name mapping

Returns

Modified row, or None if the row should be skipped

preprocess_row(row: Any) → Dict[str, Any]

Give child classes an opportunity to pre-process every row before feeding it to the FuzzyFields. This allows handling special cases.

You must use this method to manipulate the row if the underlying iterator does not natively yields dicts, e.g. a csv.reader() object.

Parameters

row – The row as read by self.iterable, with all names and before name mapping

Returns

modified row, or None if the row should be skipped

record_num = None

Current record (counting from 0), or -1 if the iteration hasn’t started yet.