Skip to content
Snippets Groups Projects
Bert Palm's avatar
c014a0df

DictOfSeries (soon renamed to SoS?)

Features

  • quite as fast as pd.DataFrame
  • every column has its own index
  • use very less memory then a disalignd pd.Dataframe
  • act quite like pd.DataFrame
  • additional align locator (.aloc[])

Indexing

  • di[] and di.loc[], di.iloc[] and di.at[], di.iat[] - should behave exactly like their counter-parts from pd.Dataframe. Most indexers are directly passed to the underling columns-series or row-series.

  • on selecting operations, Dios simply throw out rows, that wasn't selected, instead of using nan's, like pd.Dataframe do.

  • on writing operations, analogous to selecting, only selected rows are changed, un-selected rows preserve their value.

  • dios[BoolDiosLike] - like pd.DataFrame, dios accept boolean multiindexer (boolean pd.Dataframe or boolean Dios) columns and rows from the multiindexer align with the dios. This means that only matching columns are selected/written, the same apply for rows. Nevertheless columns, that are empty after applying the indexer, are preserved, with the effect that the resulting Dios always have the same (column)-dimension that the initial Dios. (This is the exact same behaivior as pd.DataFrame handle multiindexer, despite that miss-matching columns are filled with nan's)

  • additional there is a di.aloc[..] indexing method. Unlike iloc and loc indexers and values fully align if possible. Also this method handle missing values gratefully. In contrast to di[BoolDiosLike], empty columns are not preserved on selecting. Briefly:

    Grateful handling of non-alignable indexer:

    • lists (including non-boolean Series, only ser.values are used)
      • as column indexer: only matching columns are used
      • as row indexer: only matching rows are used in every series of the column
    • single labels on columns or rows: use if match

    Alignable indexer are:

    • boolean-series (a missing index is treated like an existing False value)
      • as column indexer: The index should contain column names. If the corresponding value is True and the column exist, the column will be selected/written.
      • as row indexer: The indexer will be applied on all (selected) columns. On every column the index of the boolean-series is aligned with the index of underling series. If the corresponding value is True, the row will be selected/written.
    • boolean-Dios: work like dios[BoolDiosLike] (see above), but do not preserve empty columns on selecting.
    • pd.DataFrame: like boolean-Dios

    Alignable values are:

    • series: align with every column
    • Dios: full align on columns and rows
    • pd.DataFrame: like Dios

Properties

  • columns
  • dtype
  • itype (see section Itype)
  • empty

Methods and implied features

Work mostly like analogous methods from pd.DataFrame.

  • copy()
  • copy_empty()
  • all()
  • any()
  • squeeze()
  • to_df()
  • apply()
  • astype()
  • memory_usage()
  • in
  • is
  • len(Dios)

Operators and Comparators

  • arithmetical: + - * ** // / % and abs()
  • boolean: &^|~
  • comparators: == != > >= < <=

Itype

DictOfSeries holds multiple series, where possibly every series can have a different index length and index type. Different index length, is solved with some aligning magic, or simply fail, if aligning makes no sense (eg. assigning the very same list to series of different length). The bigger problem is the type of the index. If one series has a alphabetical index, an other an numeric index, selecting along columns, can just fail in every scenario. To keep track of the types of index or to prohibit the inserting of a not fitting index type, we introduce a itype. This can be set on creation of a Dios and also changed during usage. On change of the itype, all index of all series in the dios are casted to a new fitting type, if possible. Different cast-mechanisms are available.

If a itype prohibit some certain types of index, but a series with this index-type is inserted, a implicit cast is done, with or without a warning, or an error is raised. The warning/error policy can be adjusted via global options.

Have fun :)