Skip to content
Snippets Groups Projects
Commit c048fba3 authored by Bert Palm's avatar Bert Palm 🎇
Browse files

manually merged readme review from docu

parent ae1003c4
No related branches found
No related tags found
No related merge requests found
DictOfSeries (soon renamed to SoS?)
===================================
Is a pd.Series of pd.Series object which aims to behave as much as possible similar to pd.DataFrame.
Is a pd.Series of pd.Series object which aims to behave as similar as possible to the pandas DataFrame.
Nomenclature
......@@ -17,8 +17,8 @@ Nomenclature
Features
--------
* every *column* has its own index
* use very less memory then a disalignd pd.Dataframe
* act quite like pd.DataFrame
* uses much less memory than a misaligned pd.DataFrame
* behaves quite like a pd.DataFrame
* additional align locator (`.aloc[]`)
......@@ -27,27 +27,27 @@ Indexing
**pandas-like indexing**
`dios[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like
their counter-parts from pd.Dataframe. They can take as indexer
- lists, array-like, in general iterables
`[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like
their counter-parts from pd.DataFrame. They can take as indexer
- lists, array-like objects and in general all iterables
- boolean lists and iterables
- slices
- scalars or any hashable obj
- scalars and any hashable object
Most indexers are directly passed to the underling columns-series or row-series depending
on position of the indexer and the complexity of the operation. For `.loc`, `.iloc`, `.at`
on the position of the indexer and the complexity of the operation. For `.loc`, `.iloc`, `.at`
and `iat` the first position is the *row indexer*, the second the *column indexer*. The second
can be omitted and will default to `slice(None)`. Examples:
- `di.loc[[1,2,3], ['a']]` : select labels 1,2,3 from column a
- `di.iloc[[1,2,3], [0,3]]` : select positions 1,2,3 from columns at position 0 and 3
- `di.loc[:, 'a':'c']` : select all from columns a to d
- `di.at[4,'c']` : select item at label 4 in columns c
- `di.iloc[[1,2,3], [0,3]]` : select positions 1,2,3 from the columns 0 and 3
- `di.loc[:, 'a':'c']` : select all rows from columns a to d
- `di.at[4,'c']` : select the elements with label 4 in column c
- `di.loc[:]` -> `di.loc[:,:]` : select everything
Scalar indexing always return a Series if the other indexer is a non-scalar. If both indexer
are scalars the stored item itself is returned. In all other cases a dios is returned.
Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer
are scalars, the element itself is returned. In all other cases a dios is returned.
For more pandas-like indexing magic and the differences between the indexers,
see the pandas documentation.
see the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html).
**2D-indexer**
......@@ -66,20 +66,19 @@ fill np.nans at missing locations and therefore also fill-up, whole missing colu
**setting values**
Setting values with `di[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` work like in pandas.
With `.at`/`.iat` only single items can be set,
for the others, the values can be:
- *scalars*: these are broadcast to the selected positions
- *nested lists*: the outer list length must match the number of selected columns, the inner lists lengths must
match the number of selected rows in the corresponding column.
- *dios*: the length of the indexer-dios columns must match number of selected columns - columns does *not* align,
Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works like in pandas.
With `.at`/`.iat` only single items can be set, for the other the
right hand side values can be:
- *scalars*: these are broadcasted to the selected positions
- *nested lists*: the length of the outer list must match the number of indexed columns, the lengths of the inner lists must match the number of selected rows.
- *dios*: the length of the columns must match the number of indexed columns - columns does *not* align,
they are just iterated.
Rows do align. Rows that are present on the right but not on the left are ignored.
Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
on the right, are filled with `NaN`s, like in pandas.
- *normal lists* : column indexer must be a scalar(!), the list is passed down to `loc`, `iloc` or `s[]` of the series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down to `loc`, `iloc` or `s[]`
of the series, where it maybe align, depending on the method.
- *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]`
by pandas Series, where it maybe align, depending on the method.
Examples:
......@@ -90,7 +89,7 @@ Examples:
Additional to the pandas like indexers we have a `.aloc[..]` (align locator) indexing method.
Unlike `.iloc` and `.loc` indexers fully align if possible and 1D-array-likes can be broadcast
to multiple columns at once. Also this method handle missing indexer-items gratefully.
to multiple columns at once. This method also handle missing indexer-items gracefully.
It is used like `.loc`, so a single indexer (`.aloc[indexer]`) or a tuple of row-indexer and
column-indexer (`.aloc[row-indexer, column-indexer]`) can be given.
......@@ -261,20 +260,20 @@ Properties
Methods and implied features
-------
Work mostly like analogous methods from pd.DataFrame.
- copy()
- copy_empty()
- all()
- any()
- squeeze()
- to_df()
- to_string()
- apply()
- astype()
- isna()
- notna()
- dropna()
- memory_usage()
- index_of()
- `copy()`
- `copy_empty()`
- `all()`
- `any()`
- `squeeze()`
- `to_df()`
- `to_string()`
- `apply()`
- `astype()`
- `isna()`
- `notna()`
- `dropna()`
- `memory_usage()`
- `index_of()`
- `in`
- `is`
- `len(Dios)`
......@@ -288,18 +287,19 @@ Operators and Comparators
Itype
-----
DictOfSeries holds multiple series, where possibly every series can have a different index length
and index type. Different index length, is solved with some aligning magic, or simply fail, if
aligning makes no sense (eg. assigning the very same list to series of different length (see `.aloc`).
The bigger problem is the type of the index. If one series has a alphabetical index, an other
an numeric index, selecting along columns, can just fail in every scenario. To keep track of the
DictOfSeries holds multiple series, and each series can have a different index length
and index type. Differing index lengths are either solved by some aligning magic, or simply fail, if
aligning makes no sense (eg. assigning the very same list to series of different lengths (see `.aloc`).
A bigger challange is the type of the index. If one series has an alphabetical index, and another one
a numeric index, selecting along columns can fail in every scenario. To keep track of the
types of index or to prohibit the inserting of a *not fitting* index type,
we introduce a `itype`. This can be set on creation of a Dios and also changed during usage.
we introduce the `itype`. This can be set on creation of a Dios and also changed during usage.
On change of the itype, all indexes of all series in the dios are casted to a new fitting type,
if possible. Different cast-mechanisms are available.
If a itype prohibit some certain types of indexes, but a series with a non-fitting index-type is inserted,
a implicit cast is done, with or without a warning, or an error is raised. The warning/error policy
If an itype prohibits some certain types of indexes and a series with a non-fitting index-type is inserted,
an implicit type cast is done (with or without a warning) or an error is raised. The warning/error policy
can be adjusted via global options.
Have fun :)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment