Skip to content
Snippets Groups Projects

DictOfSeries (may soon renamed)

Is a pandas.Series of pandas.Series objects which aims to behave as similar as possible to pandas.DataFrame.

Nomenclature

  • series/ser: instance of pandas.Series
  • dios: instance of dios.DictOfSeries
  • df: instance of pandas.DataFrame
  • dios-like: a dios or a df
  • alignable object: a dios, df or a series

Features

  • every column has its own index
  • uses much less memory than a misaligned pandas.DataFrame
  • behaves quite like a pandas.DataFrame
  • additional align locator (.aloc[])

Pandas-like indexing

[] and .loc[], .iloc[] and .at[], .iat[] - should behave exactly like their counter-parts from pandas.DataFrame. They can take as indexer

  • lists, array-like objects and in general all iterables
  • boolean lists and iterables
  • slices
  • scalars and any hashable object

Most indexers are directly passed to the underling columns-series or row-series depending on the position of the indexer and the complexity of the operation. For .loc, .iloc, .at and iat the first position is the row indexer, the second the column indexer. The second can be omitted and will default to slice(None). Examples:

  • di.loc[[1,2,3], ['a']] : select labels 1,2,3 from column a
  • di.iloc[[1,2,3], [0,3]] : select positions 1,2,3 from the columns 0 and 3
  • di.loc[:, 'a':'c'] : select all rows from columns a to d
  • di.at[4,'c'] : select the elements with label 4 in column c
  • di.loc[:] -> di.loc[:,:] : select everything.

Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer are scalars, the element itself is returned. In all other cases a dios is returned. For more pandas-like indexing magic and the differences between the indexers, see the pandas documentation.

Note:

In contrast to pandas.DataFrame, .loc[:] and .loc[:, :] always behaves identical. Same apply for iloc and aloc. For example, two pandas.DataFrames df1 and df2 with different columns, does align columns with df1.loc[:, :] = df2 , but does not with df1.loc[:] = df2.

If this is the desired behavior or a bug, i couldn't verify so far. -- Bert Palm

2D-indexer

dios[boolean dios-like] (as single key) - dios accept boolean 2D-indexer (boolean pandas.Dataframe or boolean Dios).

Columns and rows from the indexer align with the dios. This means that only matching columns selected and in this columns rows are selected where i) indices are match and ii) the value is True in the indexer-bool-dios. There is no difference between missing indices and present indices, but False values.

Values from unselected rows and columns are dropped, but empty columns are still preserved, with the effect that the resulting Dios always have the same column dimension than the initial dios.

Note: This is the exact same behavior like pandas.DataFrame's handling of 2D-indexer, despite that pandas.DataFrame fill numpy.nan's at missing locations and therefore also fill-up, whole missing columns with numpy.nan's.

setting values

Setting values with [] and .loc[], .iloc[] and .at[], .iat[] works like in pandas. With .at/.iat only single items can be set, for the other the right hand side values can be:

  • scalars: these are broadcasted to the selected positions
  • lists: the length the list must match the number of indexed columns. The items can be everything that can applied to a series, with the respective indexing method (loc, iloc, []).
  • dios: the length of the columns must match the number of indexed columns - columns does not align, they are just iterated. Rows do align. Rows that are present on the right but not on the left are ignored. Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present on the right, are filled with NaNs, like in pandas.
  • pandas.Series: column indexer must be a scalar(!), the series is passed down, and set with loc, iloc or [] by pandas Series, where it maybe align, depending on the method.

Examples:

  • dios.loc[2:5, 'a'] = [1,2,3] is the same as a=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a
  • dios.loc[2:5, :] = 99 : set 99 on rows 2 to 5 on all columns

The special indexer .aloc

Additional to the pandas like indexers we have a .aloc[..] (align locator) indexing method. Unlike .iloc and .loc indexers fully align if possible and 1D-array-likes can be broadcast to multiple columns at once. This method also handle missing indexer-items gracefully. It is used like .loc, so a single indexer (.aloc[indexer]) or a tuple of row-indexer and column-indexer (.aloc[row-indexer, column-indexer]) can be given. Also it can handle boolean and non-bolean 2D-Indexer.

For more information and examples see the aloc usage and the cookbook.

Properties

See also the Properties documentation

Note:

Properties that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.

  • columns
  • indexes
  • lengths
  • values
  • dtypes
  • itype
  • empty
  • size

Methods and implied features

See also the Methods documentation

Note:

Methods that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.

  • copy()
  • copy_empty()
  • all()
  • any()
  • squeeze()
  • to_df()
  • to_string()
  • apply()
  • astype()
  • isin()
  • isna()
  • notna()
  • dropna()
  • memory_usage()
  • index_of()
  • in
  • is
  • len(Dios)

Operators and Comparators

  • arithmetical: + - * ** // / % and abs()
  • boolean: &^|~
  • comparators: == != > >= < <=

Itype

DictOfSeries holds multiple series, and each series can have a different index length and index type. Differing index lengths are either solved by some aligning magic, or simply fail, if aligning makes no sense (eg. assigning the very same list to series of different lengths (see .aloc).

A bigger challange is the type of the index. If one series has an alphabetical index, and another one a numeric index, selecting along columns can fail in every scenario. To keep track of the types of index or to prohibit the inserting of a not fitting index type, we introduce the itype. This can be set on creation of a Dios and also changed during usage. On change of the itype, all indexes of all series in the dios are casted to a new fitting type, if possible. Different cast-mechanisms are available.

If an itype prohibits some certain types of indexes and a series with a non-fitting index-type is inserted, an implicit type cast is done (with or without a warning) or an error is raised. The warning/error policy can be adjusted via global options.

Have fun :)