docdocdoc, with nicy linkies

b367bf71 · Bert Palm · 8b2ba427 · b367bf71 · b367bf71 · b367bf71
Commit b367bf71 authored 5 years ago by Bert Palm 🎇
--- a/Readme.md
+++ b/Readme.md
@@ -22,10 +22,9 @@ Features
 * additional align locator (`.aloc[]`)


-Indexing
--------

-**pandas-like indexing**
+Pandas-like indexing
+--------------------

 `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like 
 their counter-parts from pd.DataFrame. They can take as indexer 
@@ -69,185 +68,34 @@ fill np.nans at missing locations and therefore also fill-up, whole missing colu
 Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works like in pandas. 
 With `.at`/`.iat` only single items can be set, for the other the
 right hand side values can be:
- *scalars*: these are broadcasted to the selected positions
- *nested lists*: the length of the outer list must match the number of indexed columns, the lengths of the inner lists must match the number of selected rows.
- *dios*: the length of the columns must match the number of indexed columns - columns does *not* align, 
-   they are just iterated. 
-   Rows do align. Rows that are present on the right but not on the left are ignored. 
-   Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
-   on the right, are filled with `NaN`s, like in pandas.
- *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]` 
-   by pandas Series, where it maybe align, depending on the method. 
-
-Examples:
+ - *scalars*: these are broadcasted to the selected positions
+ - *nested lists*: the length of the outer list must match the number of indexed columns, 
+    the lengths of the inner lists must match the number of selected rows.
+ - *dios*: the length of the columns must match the number of indexed columns - columns does *not* align, 
+    they are just iterated. 
+    Rows do align. Rows that are present on the right but not on the left are ignored. 
+    Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
+    on the right, are filled with `NaN`s, like in pandas.
+ - *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
+ - *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]` 
+    by pandas Series, where it maybe align, depending on the method. 
+
+**Examples:**

 - `dios.loc[2:5, 'a'] = [1,2,3]` is the same as `a=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a`
 - `dios.loc[2:5, :] = 99` : set 99 on rows 2 to 5 on all columns

-**the special indexer `.aloc`**
+The special indexer `.aloc`
+-----------------------------

 Additional to the pandas like indexers we have a `.aloc[..]` (align locator) indexing method. 
 Unlike `.iloc` and `.loc` indexers fully align if possible and 1D-array-likes can be broadcast 
 to multiple columns at once. This method also handle missing indexer-items gracefully. 
 It is used like `.loc`, so a single indexer (`.aloc[indexer]`) or a tuple of row-indexer and 
-column-indexer (`.aloc[row-indexer, column-indexer]`) can be given.
-Unlike the other indexer methods, it is not possible to get a single item returned; the return type 
-is either a pandas Series, iff the column-indexer is a single key (eg. `'a'`) or a dios, iff not.
-
-2D-indexer (like dios or df), only can passed as a single key, like `.aloc[2D-indexer]` or
-with a ellipsis, as column indexer, like `.aloc[2D-indexer, ...]`. The behavior may differ between these
-methods, as explained later below.
-
-If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to `:` aka. 
-`slice(None)`, so `.aloc[row-indexer]` becomes `.aloc[row-indexer, :]`, which means, that all columns are used.
-In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for 
-each column separately.
-
-Example:
-```
->> d
-    a |     b |     c |     d | 
-===== | ===== | ===== | ===== | 
-0  66 | 2  77 | 0  88 | 1  99 | 
-1  66 | 3  77 | 1  88 | 2  99 | 
-
-
->> d.aloc[[1,2], ['a', 'b', 'd']]
-    a |     b |     d | 
-===== | ===== | ===== | 
-1  66 | 2  77 | 1  99 | 
-      |       | 2  99 | 
-```
-
-Following the `.aloc` specific indexer are listed. Any indexer that is not listed (slice, boolean lists, ...) 
-are treated similar, as they would passed to `.loc` (actually they are really passed to `'loc` under the hood).
-
-*special **Column** indexer* are :
- *list / array-like* (or any iterable object): Only labels that are present in the columns are used, others are 
-   ignored. A dios is returned.
- *pd.Series* : `.values` are taken from series and handled like a *list*. A dios is returned.
- *scalar* (or any hashable obj) : Select a single column, if label is present, otherwise nothing. [1]
-
-*special **Row** indexer* are :
- *list / array-like* (or any iterable object): Only rows, which indices are present in the index of the column are 
-   used, others are ignored. A dios is returned. 
- *scalar* (or any hashable obj) : Select a single row from a column, if the value is present in the index of 
-   the column, otherwise nothing is selected. [1]
- *pd.Series* : align the index from the given Series with the column, what means only common indices are used. The 
-   actual values of the series are ignored(!).
- *boolean pd.Series* : like *pd.Series* but only True values are evaluated. 
-   False values are equivalent to missing indices. To treat a boolean series as a *normal* indexer series, as decribed
-   above, one can use `.aloc(usebool=False)[boolean pd.Series]`.
-   
-
-*special **2D**-indexer* are :
- `.aloc[boolean dios-like]` : work same like `di[boolean dios-like]` (see there). 
-   Brief: full align, select items, where the index is present and the value is True.
- `.aloc[dios-like, ...]` (with Ellipsis) : Align in columns and rows, ignore its values. Per common column,
-   the common indices are selected. The ellipsis forces `aloc`, to ignore the values, so a boolean dios could be 
-   treated as a non-boolean. Alternatively `.aloc(usebool=False)[boolean dios-like]` could be used.[2]
- `.aloc[nested list-like]` : The inner lists are used as `aloc`-*list*-row-indexer (see there) on all columns. 
-   One list for one column, which implies, that the outer list has the same length as the number of columns. 
-
-*special handling of 1D-**values***
-
-Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align
-like `s1.loc[:] = s2` do. 
-
-Examples:
-
-```
->>> d
-       a |     b | 
-======== | ===== | 
-0    0.0 | 1  50 | 
-1   70.0 | 2  60 | 
-2  140.0 | 3  70 | 
-
-
->>> d.aloc[[1,2]]
-       a |     b | 
-======== | ===== | 
-1   70.0 | 1  50 | 
-2  140.0 | 2  60 |  
-
-
->>> d.aloc[d>60]
-       a |     b | 
-======== | ===== | 
-1   70.0 | 3  70 | 
-2  140.0 |       | 
-
-
->>> d2 = d.copy()
->>> d2.aloc[d>60] = 10
->>> d2
-      a |     b | 
-======= | ===== | 
-0   0.0 | 1  50 | 
-1  10.0 | 2  60 | 
-2  10.0 | 3  10 | 
-
-
->>> d.aloc[[2,12,0,'foo'], ['a', 'x', 99, None, 99]]
-       a | 
-======== | 
-0    0.0 | 
-2  140.0 | 
-
-
->>> s=pd.Series(index=[1,11,111,1111])
->>> s
-1     NaN
-11    NaN
-111   NaN
-1111  NaN
-dtype: float64
-
-
->>> d.aloc[s]
-      a |     b | 
-======= | ===== | 
-1  70.0 | 1  50 | 
-
-
->>> d.aloc['foobar']
-Empty DictOfSeries
-Columns: ['a', 'b']
-
-
->>> d.aloc[d,...]   # (equal to use) d.aloc(usebool=False)[d]
-       a |     b | 
-======== | ===== | 
-0    0.0 | 1  50 | 
-1   70.0 | 2  60 | 
-2  140.0 | 3  70 | 
-
-
->>> d.aloc[d]
-Traceback (most recent call last):
-  File ...bad..stuff...
-ValueError: Must pass dios-like key with boolean values only if passed as single indexer
-
-
->>> b = d.astype(bool)
->>> b['b'] = False
->>> b
-       a |        b | 
-======== | ======== | 
-0  False | 1  False | 
-1   True | 2  False | 
-2   True | 3  False | 
-
-
->>> d.aloc[b]   # (equal to use) d[b]
-       a |       b | 
-======== | ======= | 
-1   70.0 | no data | 
-2  140.0 |         | 
+column-indexer (`.aloc[row-indexer, column-indexer]`) can be given. Also it can handle boolean and *non-bolean*
+2D-Indexer.

-```
+For more information and examples see the [aloc usage](/docs/aloc_usage.md) and the [cookbook](docs/cookbook.md).

 Properties
 ----------

--- a/docs/aloc_usage.md
+++ b/docs/aloc_usage.md
@@ -2,10 +2,114 @@
 =========

 Purpose
- select gracefully, so rows or columns, that was given as indexer, but doesn't exist, don't raise an error
+--------
+- select gracefully, so rows or columns, that was given as indexer, but doesn't exist, not raise an error
 - align series/dios-indexer 
 - setting multiple columns at once with a list-like value

+Overview
+--------
+`aloc` is *called* like `loc`, with a single key, that act as row indexer `aloc[rowkey]` or with a tuple of
+row indexer and column indexer `aloc[rowkey, columnkey]`. Also 2D-indexer (like dios or df) can be given, but 
+only as a single key, like `.aloc[2D-indexer]` or with the special column key `...`, 
+the ellipsis (`.aloc[2D-indexer, ...]`). The ellipsis may change, how the 2D-indexer is
+interpreted, but this will explained [later](#the-power-of-2d-indexer) in detail.
+
+If a normal (non 2D-dimensional) row indexer is given, but no column indexer, the latter defaults to `:` aka. 
+`slice(None)`, so `.aloc[row-indexer]` becomes `.aloc[row-indexer, :]`, which means, that all columns are used.
+In general, a normal row-indexer is applied to every column, that was chosen by the column indexer, but for 
+each column separately.
+
+So maybe a first example gives an rough idea:
+```
+>> d
+    a |     b |     c |     d | 
+===== | ===== | ===== | ===== | 
+0  66 | 2  77 | 0  88 | 1  99 | 
+1  66 | 3  77 | 1  88 | 2  99 | 
+
+
+>> d.aloc[[1,2], ['a', 'b', 'd']]
+    a |     b |     d | 
+===== | ===== | ===== | 
+1  66 | 2  77 | 1  99 | 
+      |       | 2  99 | 
+```
+
+The return type
+----------------
+
+Unlike the other two indexer methods `loc` and `iloc`, it is not possible to get a single item returned; 
+the return type is either a pandas.Series, iff the column-indexer is a single key (eg. `'a'`) or a dios, iff not.
+The row-indexer does not play any role in the return type choice.
+
+*Note for the curios: This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.*
+
+Indexer types
+-------------
+Following the `.aloc` specific indexer are listed. Any indexer that is not listed below (slice, boolean lists, ...), 
+but are known to work with `.loc`, are treated as they would passed to `.loc`, as they actually do under the hood.
+
+Some indexer are linked to later sections, where a more detailed explanation and examples are given.
+
+*special [Column indexer](#select-columns-gracefully) are :*
+- *list / array-like* (or any iterable object): Only labels that are present in the columns are used, others are 
+   ignored. 
+- *pd.Series* : `.values` are taken from series and handled like a *list*.
+- *scalar* (or any hashable obj) : Select a single column, if label is present, otherwise nothing. 
+
+
+*special [Row indexer](#selecting-rows-a-smart-way) are :*
+- *list / array-like* (or any iterable object): Only rows, which indices are present in the index of the column are 
+   used, others are ignored. A dios is returned. 
+- *scalar* (or any hashable obj) : Select a single row from a column, if the value is present in the index of 
+   the column, otherwise nothing is selected. [1]
+- *pd.Series* : align the index from the given Series with the column, what means only common indices are used. The 
+   actual values of the series are ignored(!).
+- *boolean pd.Series* : like *pd.Series* but only True values are evaluated. 
+   False values are equivalent to missing indices. To treat a boolean series as a *normal* indexer series, as decribed
+   above, one can use `.aloc(usebool=False)[boolean pd.Series]`.
+   
+
+*special [2D-indexer](#the-power-of-2d-indexer) are :*
+- `.aloc[boolean dios-like]` : work same like `di[boolean dios-like]` (see there). 
+   Brief: full align, select items, where the index is present and the value is True.
+- `.aloc[dios-like, ...]` (with Ellipsis) : Align in columns and rows, ignore its values. Per common column,
+   the common indices are selected. The ellipsis forces `aloc`, to ignore the values, so a boolean dios could be 
+   treated as a non-boolean. Alternatively `.aloc(usebool=False)[boolean dios-like]` could be used.[2]
+- `.aloc[nested list-like]` : The inner lists are used as `aloc`-*list*-row-indexer (see there) on all columns. 
+   One list for one column, which implies, that the outer list has the same length as the number of columns. 
+
+*special handling of 1D-**values***
+
+Values that are list- or array-like, which includes pd.Series, are set on all selected columns. pd.Series align
+like `s1.loc[:] = s2` do. See also the [cookbook](/docs/cookbook.md#broadcast-array-likes-to-multiple-columns).
+
+*Indexer Table*
+
+| example | type | on | handling |
+| ------ | ------ | ------ |------ |
+|**column indexer**| 
+| `.aloc[any, ['a']]`         | scalar                | columns | graceful |
+| `.aloc[any, ['a','c']]`     | list-like             | columns | graceful |
+| `.aloc[any [True,False]]`   | bool list-like        | columns | take `True`'s , length must match (!) |
+| `.aloc[any, s]`             | pandas.Series         | columns | like list, only values |
+| `.aloc[any, bs]`            | bool pandas.Series    | columns | like bool-list |
+| `.aloc[any, 'b':'z']`       | slice                 | columns | filter |
+|**row indexer**| 
+| `.aloc[7, any]`             | scalar                | rows | translate to `.loc[key:key]` |
+| `.aloc[[1,2,24], any]`      | list-like             | rows | handle graceful |
+| `.aloc[[True,False], any]`  | bool list-like        | rows | take `True`'s, length must match nr of (all selected) columns (!) |
+| `.aloc[s, any]`             | pandas.Series         | rows | like `.loc[s.index]` |
+| `.aloc[bs, any]`            |  bool pandas.Series   | rows | align + just take `True`'s, [1]  |
+|**2D indexer**| 
+| `.aloc[[[s],[1,2,3]], any]` | nested list-like  | both | one row-indexer per column, outer length must match nr of columns(!) |
+| `.aloc[di]`                 | dios-like         | both | full align  |
+| `.aloc[di, ...]`            | dios-like         | both | full align, ellipsis has no effect |
+| `.aloc[di>5]`               | bool dios-like    | both | full align + take `True`'s [1] |
+| `.aloc[di>5, ...]`          | (bool) dios-like  | both | full align, disable bool evaluation |
+[1] evaluate `usebool`-keyword
+
 Example dios
 ---------

@@ -51,7 +155,7 @@ Just like selecting *single columns gracefully*, but with a array-like indexer.
 A dios is returned, with a subset of the existing columns. 
 If no key is present a empty dios is returned. 

-If the key is a pandas Series, its *values* are used for indexing, especially the Series's index is ignored.
+If the key is a pandas.Series, its *values* are used for indexing, especially the Series's index is ignored.

 To select all columns simply use `.aloc[:,:]` or even simpler `.aloc[:]`, just like one would do with `loc` or `iloc`.

@@ -83,21 +187,145 @@ d.aloc[:, s]
 Selecting Rows a smart way
 --------------------------

-Overview: 
+For scalar and array-like indexer with label values, the keys are handled gracefully, just like with 
+array-like column indexers.

-|                      |        |
-| ------               | ------ |
-| `.aloc[s]`           | like `.loc[s.index]` |
-| `.aloc[list]`        | handle graceful |
-| `.aloc[bool list]`   | no merci, length must match all (selected) columns |
-| `.aloc[bool series]` | align index and just take `True`'s -- [1]  |
-| `.aloc[key]`         | translate to `.loc[key:key]` |
-[1] evaluate `usebool`-keyword
+``` 
+>>> d.aloc[1]
+   a |       b |       c |       d | 
+==== | ======= | ======= | ======= | 
+1  7 | no data | no data | no data | 

-Note for the curios: *Because of `.aloc[key]` translates to `.loc[key:key]`, dios never return a single item, 
-nor a columns-indexed Series*.
+>>> d.aloc[99]
+Empty DictOfSeries
+Columns: ['a', 'b', 'c', 'd']
+
+>>> d.aloc[[3,6,7,18]]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+3  21 | 3  6 | 6  27 | 6  0 | 
+      | 6  9 | 7  37 | 7  1 | 
+```
+
+The length of columns can differ:
+``` 
+>>> d.aloc[[3,6,7,18]].aloc[[3,6]]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+3  21 | 3  6 | 6  27 | 6  0 | 
+      | 6  9 |       |      | 
+```
+
+Boolean array-likes as row indexer
+---------------------------------
+
+For array-like indexer that hold boolean values, the length of the indexer and
+the length of all column(s) to index must match.
+``` 
+>>> d.aloc[[True,False,False,True,False]]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+0   0 | 2  5 | 4   7 | 6  0 | 
+3  21 | 5  8 | 7  37 | 9  3 | 
+```
+If the length does not match a `IndexError` is raised:
+```
+>>> d.aloc[[True,False,False]]
+Traceback (most recent call last):
+  ...
+  f"Boolean index has wrong length: "
+IndexError: failed for column a: Boolean index has wrong length: 3 instead of 5
+```
+
+This can be tricky, especially if columns have different length:
+``` 
+>>> difflen
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+0   0 | 2  5 | 4   7 | 6  0 | 
+1   7 | 3  6 | 6  27 | 7  1 | 
+2  14 | 4  7 |       | 8  2 | 
+
+>>> difflen.aloc[[False,True,False]]
+Traceback (most recent call last):
+  ...
+  f"Boolean index has wrong length: "
+IndexError: Boolean index has wrong length: 3 instead of 2
+```
+
+pandas.Series and boolean pandas.Series as row indexer
+------------------------------------------------------
+
+When using a pandas.Series as row indexer with `aloc`, all its magic comes to light.
+The index of the given series align itself with the index of each column separately and is this way used as a filter.
+
+```
+>>> s = d['b'] + 100
+>>> s
+2    105
+3    106
+4    107
+5    108
+6    109
+Name: b, dtype: int64
+
+>>> d.aloc[s]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+2  14 | 2  5 | 4   7 | 6  0 | 
+3  21 | 3  6 | 5  17 |      | 
+4  28 | 4  7 | 6  27 |      | 
+      | 5  8 |       |      | 
+      | 6  9 |       |      | 
+```
+
+As seen in the example above the series' values are ignored completely. The functionality  
+is similar to `s1.loc[s2.index]`, with `s1` and `s2` are pandas.Series's, and s2 is the indexer and s1 is one column 
+after the other.
+
+If the indexer series holds boolean values they are not ignored. 
+The series align the same way as explained above, but additional only the `True` values are evaluated. 
+Thus `False`-values are treated like missing indices. The behavior here is analogous to `s1.loc[s2[s2].index]`.
+
+``` 
+>>> boolseries = d['b'] > 6
+>>> boolseries
+2    False
+3    False
+4     True
+5     True
+6     True
+Name: b, dtype: bool
+
+>>> d.aloc[boolseries]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+4  28 | 4  7 | 4   7 | 6  0 | 
+      | 5  8 | 5  17 |      | 
+      | 6  9 | 6  27 |      | 
+```
+
+To evaluate boolean values is a very handy feature, as it can easily used with multiple conditions and also fits
+nicely with writing those as one-liner:
+
+``` 
+>>> d.aloc[d['b'] > 6]
+    a |    b |     c |    d | 
+===== | ==== | ===== | ==== | 
+4  28 | 4  7 | 4   7 | 6  0 | 
+      | 5  8 | 5  17 |      | 
+      | 6  9 | 6  27 |      | 
+
+>>> d.aloc[(d['a'] > 6) & (d['b'] > 6)]
+    a |    b |    c |       d | 
+===== | ==== | ==== | ======= | 
+4  28 | 4  7 | 4  7 | no data | 
+```
+
+Nevertheless, something like `d.aloc[d['a'] > d['b']]` do not work, because the comparison fails, 
+as long as the two series objects not have the same index. But maybe one want to checkout 
+[DictOfSeries.index_of()](/docs/methods_and_properties.md#diosdictofseriesindex_of).

-**T_O_D_O**

 The power of 2D-indexer
 -----------------------

--- a/docs/cookbook.md
+++ b/docs/cookbook.md
@@ -7,6 +7,8 @@ Recipes
 - align dios with dios
 - get/set values by condition
 - apply a value to multiple columns
+- [Broadcast array-likes to multiple columns](#broadcast-array-likes-to-multiple-columns)
+- apply a array-like value to multiple columns
 - nan-policy - mask vs. drop values, when nan's are inserted (mv to Readme ??)
 - itype - when to use, pitfalls and best-practise
 - changing the index of series' in dios (one, some, all)
@@ -14,3 +16,8 @@ Recipes
 - changing properties of series' in dios (one, some, all)

 **T_O_D_O**
+
+
+Broadcast array-likes to multiple columns
+-----------------------------------------
+**T_O_D_O**
--- a/docs/methods_and_properties.md
+++ b/docs/methods_and_properties.md
@@ -7,22 +7,22 @@ Methods
 Brief
 - `copy(deep=True)` : Return a copy. See also [pandas.DataFrame.copy](
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html) 
- - [copy_empty()](#diosdictofseriescopy_empty) : Return a new DictOfSeries object, with same properties than the original. 
+ - [`copy_empty()`](#diosdictofseriescopy_empty) : Return a new DictOfSeries object, with same properties than the original. 
 - `all(axis=0)` : Return whether all elements are True, potentially over an axis. See also [pandas.DataFrame.all](
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.all.html)
 - `any(axis=0)` : Return whether any element is True, potentially over an axis. See also [pandas.DataFrame.any](
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html)
 - `squeeze(axis=None)` : Squeeze a 1-dimensional axis objects into scalars. 
    See also [pandas.DataFrame.squeeze](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.squeeze.html)
- - [to_df()](#diosdictofseriesto_df) : Transform the Dios to a pandas.DataFrame
+ - [`to_df()`](#diosdictofseriesto_df) : Transform the Dios to a pandas.DataFrame
 - `to_string(kwargs)` : Return a string representation of the Dios.
- - [apply()](#diosdictofseriesapply) : apply the given function to every column in the dios eg. 
+ - [`apply()`](#diosdictofseriesapply) : apply the given function to every column in the dios eg. 
 - `astype()` : Cast the data to the given data type.
 - `isin()` : return a boolean dios, that indicates if the corresponding value is in the given array-like
 - `isna()` : Return a bolean array that is `True` if the value is a Nan-value
 - `notna()` : inverse of `isnan()`
 - `dropna()` : drop all Nan-values
- - [index_of()](#diosdictofseriesindex_of): Return a single(!) Index that is constructed from all the indexes of the columns. 
+ - [`index_of()`](#diosdictofseriesindex_of): Return a single(!) Index that is constructed from all the indexes of the columns. 
 - `len(Dios)` : return the number of columns the dios has.