DictOfSeries (soon renamed to SoS?)
Features
- quite as fast as pd.DataFrame
- every column has its own index
- use very less memory then a disalignd pd.Dataframe
- act quite like pd.DataFrame
- additional align locator (
.aloc[]
)
Indexing
-
di[]
anddi.loc[]
,di.iloc[]
anddi.at[]
,di.iat[]
- should behave exactly like their counter-parts from pd.Dataframe. Most indexers are directly passed to the underling columns-series or row-series. -
on selecting operations, Dios simply throw out rows, that wasn't selected, instead of using
nan
's, like pd.Dataframe do. -
on writing operations, analogous to selecting, only selected rows are changed, un-selected rows preserve their value.
-
dios[BoolDiosLike]
- like pd.DataFrame, dios accept boolean multiindexer (boolean pd.Dataframe or boolean Dios) columns and rows from the multiindexer align with the dios. This means that only matching columns are selected/written, the same apply for rows. Nevertheless columns, that are empty after applying the indexer, are preserved, with the effect that the resulting Dios always have the same (column)-dimension that the initial Dios. (This is the exact same behaivior as pd.DataFrame handle multiindexer, despite that miss-matching columns are filled with nan's) -
additional there is a
di.aloc[..]
indexing method. Unlikeiloc
andloc
indexers and values fully align if possible. Also this method handle missing values gratefully. In contrast todi[BoolDiosLike]
, empty columns are not preserved on selecting. Briefly:Grateful handling of non-alignable indexer:
-
lists (including non-boolean Series, only
ser.values
are used)- as column indexer: only matching columns are used
- as row indexer: only matching rows are used in every series of the column
- single labels on columns or rows: use if match
Alignable indexer are:
-
boolean-series (a missing index is treated like an existing
False
value)- as column indexer: The index should contain column names. If the corresponding value is
True
and the column exist, the column will be selected/written. - as row indexer: The indexer will be applied on all (selected) columns.
On every column the index of the boolean-series is aligned with the index of underling series.
If the corresponding value is
True
, the row will be selected/written.
- as column indexer: The index should contain column names. If the corresponding value is
-
boolean-Dios: work like
dios[BoolDiosLike]
(see above), but do not preserve empty columns on selecting. - pd.DataFrame: like boolean-Dios
Alignable values are:
- series: align with every column
- Dios: full align on columns and rows
- pd.DataFrame: like Dios
-
lists (including non-boolean Series, only
Properties
- columns
- dtype
- itype (see section Itype)
- empty
Methods and implied features
Work mostly like analogous methods from pd.DataFrame.
- copy()
- copy_empty()
- all()
- any()
- squeeze()
- to_df()
- apply()
- astype()
- memory_usage()
in
is
len(Dios)
Operators and Comparators
- arithmetical:
+ - * ** // / %
andabs()
- boolean:
&^|~
- comparators:
== != > >= < <=
Itype
DictOfSeries holds multiple series, where possibly every series can have a different index length
and index type. Different index length, is solved with some aligning magic, or simply fail, if
aligning makes no sense (eg. assigning the very same list to series of different length).
The bigger problem is the type of the index. If one series has a alphabetical index, an other
an numeric index, selecting along columns, can just fail in every scenario. To keep track of the
types of index or to prohibit the inserting of a not fitting index type,
we introduce a itype
. This can be set on creation of a Dios and also changed during usage.
On change of the itype, all index of all series in the dios are casted to a new fitting type,
if possible. Different cast-mechanisms are available.
If a itype prohibit some certain types of index, but a series with this index-type is inserted, a implicit cast is done, with or without a warning, or an error is raised. The warning/error policy can be adjusted via global options.
Have fun :)