Skip to content
Snippets Groups Projects
Commit b367bf71 authored by Bert Palm's avatar Bert Palm 🎇
Browse files

docdocdoc, with nicy linkies

parent 8b2ba427
No related branches found
No related tags found
No related merge requests found
......@@ -22,10 +22,9 @@ Features
* additional align locator (`.aloc[]`)
Indexing
--------
**pandas-like indexing**
Pandas-like indexing
--------------------
`[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like
their counter-parts from pd.DataFrame. They can take as indexer
......@@ -69,185 +68,34 @@ fill np.nans at missing locations and therefore also fill-up, whole missing colu
Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works like in pandas.
With `.at`/`.iat` only single items can be set, for the other the
right hand side values can be:
- *scalars*: these are broadcasted to the selected positions
- *nested lists*: the length of the outer list must match the number of indexed columns, the lengths of the inner lists must match the number of selected rows.
- *dios*: the length of the columns must match the number of indexed columns - columns does *not* align,
they are just iterated.
Rows do align. Rows that are present on the right but not on the left are ignored.
Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
on the right, are filled with `NaN`s, like in pandas.
- *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]`
by pandas Series, where it maybe align, depending on the method.
Examples:
- *scalars*: these are broadcasted to the selected positions
- *nested lists*: the length of the outer list must match the number of indexed columns,
the lengths of the inner lists must match the number of selected rows.
- *dios*: the length of the columns must match the number of indexed columns - columns does *not* align,
they are just iterated.
Rows do align. Rows that are present on the right but not on the left are ignored.
Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
on the right, are filled with `NaN`s, like in pandas.
- *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]`
by pandas Series, where it maybe align, depending on the method.
**Examples:**
- `dios.loc[2:5, 'a'] = [1,2,3]` is the same as `a=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a`
- `dios.loc[2:5, :] = 99` : set 99 on rows 2 to 5 on all columns
**the special indexer `.aloc`**
The special indexer `.aloc`
-----------------------------
Additional to the pandas like indexers we have a `.aloc[..]` (align locator) indexing method.
Unlike `.iloc` and `.loc` indexers fully align if possible and 1D-array-likes can be broadcast
to multiple columns at once. This method also handle missing indexer-items gracefully.
It is used like `.loc`, so a single indexer (`.aloc[indexer]`) or a tuple of row-indexer and
column-indexer (`.aloc[row-indexer, column-indexer]`) can be given.
Unlike the other indexer methods, it is not possible to get a single item returned; the return type
is either a pandas Series, iff the column-indexer is a single key (eg. `'a'`) or a dios, iff not.
2D-indexer (like dios or df), only can passed as a single key, like `.aloc[2D-indexer]` or
with a ellipsis, as column indexer, like `.aloc[2D-indexer, ...]`. The behavior may differ between these
methods, as explained later below.
If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to `:` aka.
`slice(None)`, so `.aloc[row-indexer]` becomes `.aloc[row-indexer, :]`, which means, that all columns are used.
In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for
each column separately.
Example:
```
>> d
a | b | c | d |
===== | ===== | ===== | ===== |
0 66 | 2 77 | 0 88 | 1 99 |
1 66 | 3 77 | 1 88 | 2 99 |
>> d.aloc[[1,2], ['a', 'b', 'd']]
a | b | d |
===== | ===== | ===== |
1 66 | 2 77 | 1 99 |
| | 2 99 |
```
Following the `.aloc` specific indexer are listed. Any indexer that is not listed (slice, boolean lists, ...)
are treated similar, as they would passed to `.loc` (actually they are really passed to `'loc` under the hood).
*special **Column** indexer* are :
- *list / array-like* (or any iterable object): Only labels that are present in the columns are used, others are
ignored. A dios is returned.
- *pd.Series* : `.values` are taken from series and handled like a *list*. A dios is returned.
- *scalar* (or any hashable obj) : Select a single column, if label is present, otherwise nothing. [1]
*special **Row** indexer* are :
- *list / array-like* (or any iterable object): Only rows, which indices are present in the index of the column are
used, others are ignored. A dios is returned.
- *scalar* (or any hashable obj) : Select a single row from a column, if the value is present in the index of
the column, otherwise nothing is selected. [1]
- *pd.Series* : align the index from the given Series with the column, what means only common indices are used. The
actual values of the series are ignored(!).
- *boolean pd.Series* : like *pd.Series* but only True values are evaluated.
False values are equivalent to missing indices. To treat a boolean series as a *normal* indexer series, as decribed
above, one can use `.aloc(usebool=False)[boolean pd.Series]`.
*special **2D**-indexer* are :
- `.aloc[boolean dios-like]` : work same like `di[boolean dios-like]` (see there).
Brief: full align, select items, where the index is present and the value is True.
- `.aloc[dios-like, ...]` (with Ellipsis) : Align in columns and rows, ignore its values. Per common column,
the common indices are selected. The ellipsis forces `aloc`, to ignore the values, so a boolean dios could be
treated as a non-boolean. Alternatively `.aloc(usebool=False)[boolean dios-like]` could be used.[2]
- `.aloc[nested list-like]` : The inner lists are used as `aloc`-*list*-row-indexer (see there) on all columns.
One list for one column, which implies, that the outer list has the same length as the number of columns.
*special handling of 1D-**values***
Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align
like `s1.loc[:] = s2` do.
Examples:
```
>>> d
a | b |
======== | ===== |
0 0.0 | 1 50 |
1 70.0 | 2 60 |
2 140.0 | 3 70 |
>>> d.aloc[[1,2]]
a | b |
======== | ===== |
1 70.0 | 1 50 |
2 140.0 | 2 60 |
>>> d.aloc[d>60]
a | b |
======== | ===== |
1 70.0 | 3 70 |
2 140.0 | |
>>> d2 = d.copy()
>>> d2.aloc[d>60] = 10
>>> d2
a | b |
======= | ===== |
0 0.0 | 1 50 |
1 10.0 | 2 60 |
2 10.0 | 3 10 |
>>> d.aloc[[2,12,0,'foo'], ['a', 'x', 99, None, 99]]
a |
======== |
0 0.0 |
2 140.0 |
>>> s=pd.Series(index=[1,11,111,1111])
>>> s
1 NaN
11 NaN
111 NaN
1111 NaN
dtype: float64
>>> d.aloc[s]
a | b |
======= | ===== |
1 70.0 | 1 50 |
>>> d.aloc['foobar']
Empty DictOfSeries
Columns: ['a', 'b']
>>> d.aloc[d,...] # (equal to use) d.aloc(usebool=False)[d]
a | b |
======== | ===== |
0 0.0 | 1 50 |
1 70.0 | 2 60 |
2 140.0 | 3 70 |
>>> d.aloc[d]
Traceback (most recent call last):
File ...bad..stuff...
ValueError: Must pass dios-like key with boolean values only if passed as single indexer
>>> b = d.astype(bool)
>>> b['b'] = False
>>> b
a | b |
======== | ======== |
0 False | 1 False |
1 True | 2 False |
2 True | 3 False |
>>> d.aloc[b] # (equal to use) d[b]
a | b |
======== | ======= |
1 70.0 | no data |
2 140.0 | |
column-indexer (`.aloc[row-indexer, column-indexer]`) can be given. Also it can handle boolean and *non-bolean*
2D-Indexer.
```
For more information and examples see the [aloc usage](/docs/aloc_usage.md) and the [cookbook](docs/cookbook.md).
Properties
----------
......
......@@ -2,10 +2,114 @@
=========
Purpose
- select gracefully, so rows or columns, that was given as indexer, but doesn't exist, don't raise an error
--------
- select gracefully, so rows or columns, that was given as indexer, but doesn't exist, not raise an error
- align series/dios-indexer
- setting multiple columns at once with a list-like value
Overview
--------
`aloc` is *called* like `loc`, with a single key, that act as row indexer `aloc[rowkey]` or with a tuple of
row indexer and column indexer `aloc[rowkey, columnkey]`. Also 2D-indexer (like dios or df) can be given, but
only as a single key, like `.aloc[2D-indexer]` or with the special column key `...`,
the ellipsis (`.aloc[2D-indexer, ...]`). The ellipsis may change, how the 2D-indexer is
interpreted, but this will explained [later](#the-power-of-2d-indexer) in detail.
If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to `:` aka.
`slice(None)`, so `.aloc[row-indexer]` becomes `.aloc[row-indexer, :]`, which means, that all columns are used.
In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for
each column separately.
So maybe a first example gives an rough idea:
```
>> d
a | b | c | d |
===== | ===== | ===== | ===== |
0 66 | 2 77 | 0 88 | 1 99 |
1 66 | 3 77 | 1 88 | 2 99 |
>> d.aloc[[1,2], ['a', 'b', 'd']]
a | b | d |
===== | ===== | ===== |
1 66 | 2 77 | 1 99 |
| | 2 99 |
```
The return type
----------------
Unlike the other two indexer methods `loc` and `iloc`, it is not possible to get a single item returned;
the return type is either a pandas.Series, iff the column-indexer is a single key (eg. `'a'`) or a dios, iff not.
The row-indexer does not play any role in the return type choice.
*Note for the curios: This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.*
Indexer types
-------------
Following the `.aloc` specific indexer are listed. Any indexer that is not listed below (slice, boolean lists, ...),
but are known to work with `.loc`, are treated as they would passed to `.loc`, as they actually do under the hood.
Some indexer are linked to later sections, where a more detailed explanation and examples are given.
*special [Column indexer](#select-columns-gracefully) are :*
- *list / array-like* (or any iterable object): Only labels that are present in the columns are used, others are
ignored.
- *pd.Series* : `.values` are taken from series and handled like a *list*.
- *scalar* (or any hashable obj) : Select a single column, if label is present, otherwise nothing.
*special [Row indexer](#selecting-rows-a-smart-way) are :*
- *list / array-like* (or any iterable object): Only rows, which indices are present in the index of the column are
used, others are ignored. A dios is returned.
- *scalar* (or any hashable obj) : Select a single row from a column, if the value is present in the index of
the column, otherwise nothing is selected. [1]
- *pd.Series* : align the index from the given Series with the column, what means only common indices are used. The
actual values of the series are ignored(!).
- *boolean pd.Series* : like *pd.Series* but only True values are evaluated.
False values are equivalent to missing indices. To treat a boolean series as a *normal* indexer series, as decribed
above, one can use `.aloc(usebool=False)[boolean pd.Series]`.
*special [2D-indexer](#the-power-of-2d-indexer) are :*
- `.aloc[boolean dios-like]` : work same like `di[boolean dios-like]` (see there).
Brief: full align, select items, where the index is present and the value is True.
- `.aloc[dios-like, ...]` (with Ellipsis) : Align in columns and rows, ignore its values. Per common column,
the common indices are selected. The ellipsis forces `aloc`, to ignore the values, so a boolean dios could be
treated as a non-boolean. Alternatively `.aloc(usebool=False)[boolean dios-like]` could be used.[2]
- `.aloc[nested list-like]` : The inner lists are used as `aloc`-*list*-row-indexer (see there) on all columns.
One list for one column, which implies, that the outer list has the same length as the number of columns.
*special handling of 1D-**values***
Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align
like `s1.loc[:] = s2` do. See also the [cookbook](/docs/cookbook.md#broadcast-array-likes-to-multiple-columns).
*Indexer Table*
| example | type | on | handling |
| ------ | ------ | ------ |------ |
|**column indexer**|
| `.aloc[any, ['a']]` | scalar | columns | graceful |
| `.aloc[any, ['a','c']]` | list-like | columns | graceful |
| `.aloc[any [True,False]]` | bool list-like | columns | take `True`'s , length must match (!) |
| `.aloc[any, s]` | pandas.Series | columns | like list, only values |
| `.aloc[any, bs]` | bool pandas.Series | columns | like bool-list |
| `.aloc[any, 'b':'z']` | slice | columns | filter |
|**row indexer**|
| `.aloc[7, any]` | scalar | rows | translate to `.loc[key:key]` |
| `.aloc[[1,2,24], any]` | list-like | rows | handle graceful |
| `.aloc[[True,False], any]` | bool list-like | rows | take `True`'s, length must match nr of (all selected) columns (!) |
| `.aloc[s, any]` | pandas.Series | rows | like `.loc[s.index]` |
| `.aloc[bs, any]` | bool pandas.Series | rows | align + just take `True`'s, [1] |
|**2D indexer**|
| `.aloc[[[s],[1,2,3]], any]` | nested list-like | both | one row-indexer per column, outer length must match nr of columns(!) |
| `.aloc[di]` | dios-like | both | full align |
| `.aloc[di, ...]` | dios-like | both | full align, ellipsis has no effect |
| `.aloc[di>5]` | bool dios-like | both | full align + take `True`'s [1] |
| `.aloc[di>5, ...]` | (bool) dios-like | both | full align, disable bool evaluation |
[1] evaluate `usebool`-keyword
Example dios
---------
......@@ -51,7 +155,7 @@ Just like selecting *single columns gracefully*, but with a array-like indexer.
A dios is returned, with a subset of the existing columns.
If no key is present a empty dios is returned.
If the key is a pandas Series, its *values* are used for indexing, especially the Series's index is ignored.
If the key is a pandas.Series, its *values* are used for indexing, especially the Series's index is ignored.
To select all columns simply use `.aloc[:,:]` or even simpler `.aloc[:]`, just like one would do with `loc` or `iloc`.
......@@ -83,21 +187,145 @@ d.aloc[:, s]
Selecting Rows a smart way
--------------------------
Overview:
For scalar and array-like indexer with label values, the keys are handled gracefully, just like with
array-like column indexers.
| | |
| ------ | ------ |
| `.aloc[s]` | like `.loc[s.index]` |
| `.aloc[list]` | handle graceful |
| `.aloc[bool list]` | no merci, length must match all (selected) columns |
| `.aloc[bool series]` | align index and just take `True`'s -- [1] |
| `.aloc[key]` | translate to `.loc[key:key]` |
[1] evaluate `usebool`-keyword
```
>>> d.aloc[1]
a | b | c | d |
==== | ======= | ======= | ======= |
1 7 | no data | no data | no data |
Note for the curios: *Because of `.aloc[key]` translates to `.loc[key:key]`, dios never return a single item,
nor a columns-indexed Series*.
>>> d.aloc[99]
Empty DictOfSeries
Columns: ['a', 'b', 'c', 'd']
>>> d.aloc[[3,6,7,18]]
a | b | c | d |
===== | ==== | ===== | ==== |
3 21 | 3 6 | 6 27 | 6 0 |
| 6 9 | 7 37 | 7 1 |
```
The length of columns can differ:
```
>>> d.aloc[[3,6,7,18]].aloc[[3,6]]
a | b | c | d |
===== | ==== | ===== | ==== |
3 21 | 3 6 | 6 27 | 6 0 |
| 6 9 | | |
```
Boolean array-likes as row indexer
---------------------------------
For array-like indexer that hold boolean values, the length of the indexer and
the length of all column(s) to index must match.
```
>>> d.aloc[[True,False,False,True,False]]
a | b | c | d |
===== | ==== | ===== | ==== |
0 0 | 2 5 | 4 7 | 6 0 |
3 21 | 5 8 | 7 37 | 9 3 |
```
If the length does not match a `IndexError` is raised:
```
>>> d.aloc[[True,False,False]]
Traceback (most recent call last):
...
f"Boolean index has wrong length: "
IndexError: failed for column a: Boolean index has wrong length: 3 instead of 5
```
This can be tricky, especially if columns have different length:
```
>>> difflen
a | b | c | d |
===== | ==== | ===== | ==== |
0 0 | 2 5 | 4 7 | 6 0 |
1 7 | 3 6 | 6 27 | 7 1 |
2 14 | 4 7 | | 8 2 |
>>> difflen.aloc[[False,True,False]]
Traceback (most recent call last):
...
f"Boolean index has wrong length: "
IndexError: Boolean index has wrong length: 3 instead of 2
```
pandas.Series and boolean pandas.Series as row indexer
------------------------------------------------------
When using a pandas.Series as row indexer with `aloc`, all its magic comes to light.
The index of the given series align itself with the index of each column separately and is this way used as a filter.
```
>>> s = d['b'] + 100
>>> s
2 105
3 106
4 107
5 108
6 109
Name: b, dtype: int64
>>> d.aloc[s]
a | b | c | d |
===== | ==== | ===== | ==== |
2 14 | 2 5 | 4 7 | 6 0 |
3 21 | 3 6 | 5 17 | |
4 28 | 4 7 | 6 27 | |
| 5 8 | | |
| 6 9 | | |
```
As seen in the example above the series' values are ignored completely. The functionality
is similar to `s1.loc[s2.index]`, with `s1` and `s2` are pandas.Series's, and s2 is the indexer and s1 is one column
after the other.
If the indexer series holds boolean values they are not ignored.
The series align the same way as explained above, but additional only the `True` values are evaluated.
Thus `False`-values are treated like missing indices. The behavior here is analogous to `s1.loc[s2[s2].index]`.
```
>>> boolseries = d['b'] > 6
>>> boolseries
2 False
3 False
4 True
5 True
6 True
Name: b, dtype: bool
>>> d.aloc[boolseries]
a | b | c | d |
===== | ==== | ===== | ==== |
4 28 | 4 7 | 4 7 | 6 0 |
| 5 8 | 5 17 | |
| 6 9 | 6 27 | |
```
To evaluate boolean values is a very handy feature, as it can easily used with multiple conditions and also fits
nicely with writing those as one-liner:
```
>>> d.aloc[d['b'] > 6]
a | b | c | d |
===== | ==== | ===== | ==== |
4 28 | 4 7 | 4 7 | 6 0 |
| 5 8 | 5 17 | |
| 6 9 | 6 27 | |
>>> d.aloc[(d['a'] > 6) & (d['b'] > 6)]
a | b | c | d |
===== | ==== | ==== | ======= |
4 28 | 4 7 | 4 7 | no data |
```
Nevertheless, something like `d.aloc[d['a'] > d['b']]` do not work, because the comparison fails,
as long as the two series objects not have the same index. But maybe one want to checkout
[DictOfSeries.index_of()](/docs/methods_and_properties.md#diosdictofseriesindex_of).
**T_O_D_O**
The power of 2D-indexer
-----------------------
......
......@@ -7,6 +7,8 @@ Recipes
- align dios with dios
- get/set values by condition
- apply a value to multiple columns
- [Broadcast array-likes to multiple columns](#broadcast-array-likes-to-multiple-columns)
- apply a array-like value to multiple columns
- nan-policy - mask vs. drop values, when nan's are inserted (mv to Readme ??)
- itype - when to use, pitfalls and best-practise
- changing the index of series' in dios (one, some, all)
......@@ -14,3 +16,8 @@ Recipes
- changing properties of series' in dios (one, some, all)
**T_O_D_O**
Broadcast array-likes to multiple columns
-----------------------------------------
**T_O_D_O**
......@@ -7,22 +7,22 @@ Methods
Brief
- `copy(deep=True)` : Return a copy. See also [pandas.DataFrame.copy](
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html)
- [copy_empty()](#diosdictofseriescopy_empty) : Return a new DictOfSeries object, with same properties than the original.
- [`copy_empty()`](#diosdictofseriescopy_empty) : Return a new DictOfSeries object, with same properties than the original.
- `all(axis=0)` : Return whether all elements are True, potentially over an axis. See also [pandas.DataFrame.all](
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.all.html)
- `any(axis=0)` : Return whether any element is True, potentially over an axis. See also [pandas.DataFrame.any](
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html)
- `squeeze(axis=None)` : Squeeze a 1-dimensional axis objects into scalars.
See also [pandas.DataFrame.squeeze](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.squeeze.html)
- [to_df()](#diosdictofseriesto_df) : Transform the Dios to a pandas.DataFrame
- [`to_df()`](#diosdictofseriesto_df) : Transform the Dios to a pandas.DataFrame
- `to_string(kwargs)` : Return a string representation of the Dios.
- [apply()](#diosdictofseriesapply) : apply the given function to every column in the dios eg.
- [`apply()`](#diosdictofseriesapply) : apply the given function to every column in the dios eg.
- `astype()` : Cast the data to the given data type.
- `isin()` : return a boolean dios, that indicates if the corresponding value is in the given array-like
- `isna()` : Return a bolean array that is `True` if the value is a Nan-value
- `notna()` : inverse of `isnan()`
- `dropna()` : drop all Nan-values
- [index_of()](#diosdictofseriesindex_of): Return a single(!) Index that is constructed from all the indexes of the columns.
- [`index_of()`](#diosdictofseriesindex_of): Return a single(!) Index that is constructed from all the indexes of the columns.
- `len(Dios)` : return the number of columns the dios has.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment