Skip to content
Snippets Groups Projects
Commit 13fefe03 authored by Bert Palm's avatar Bert Palm 🎇
Browse files

documentation, added Notes :D

parent 40142135
No related branches found
No related tags found
No related merge requests found
DictOfSeries (soon renamed to SoS?)
DictOfSeries (may soon renamed)
===================================
Is a pd.Series of pd.Series object which aims to behave as similar as possible to the pandas DataFrame.
Is a pandas.Series of pandas.Series objects which aims to behave as similar as possible to pandas.DataFrame.
Nomenclature
------------
- pd: pandas
- series/ser: instance of pd.Series
- dios: instance of DictOfSeries
- df: instance of pd.DataFrame
- series/ser: instance of pandas.Series
- dios: instance of dios.DictOfSeries
- df: instance of pandas.DataFrame
- dios-like: a *dios* or a *df*
- alignable object: a *dios*, *df* or a *series*
......@@ -17,8 +16,8 @@ Nomenclature
Features
--------
* every *column* has its own index
* uses much less memory than a misaligned pd.DataFrame
* behaves quite like a pd.DataFrame
* uses much less memory than a misaligned pandas.DataFrame
* behaves quite like a pandas.DataFrame
* additional align locator (`.aloc[]`)
......@@ -27,7 +26,7 @@ Pandas-like indexing
--------------------
`[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like
their counter-parts from pd.DataFrame. They can take as indexer
their counter-parts from pandas.DataFrame. They can take as indexer
- lists, array-like objects and in general all iterables
- boolean lists and iterables
- slices
......@@ -41,16 +40,24 @@ can be omitted and will default to `slice(None)`. Examples:
- `di.iloc[[1,2,3], [0,3]]` : select positions 1,2,3 from the columns 0 and 3
- `di.loc[:, 'a':'c']` : select all rows from columns a to d
- `di.at[4,'c']` : select the elements with label 4 in column c
- `di.loc[:]` -> `di.loc[:,:]` : select everything
- `di.loc[:]` -> `di.loc[:,:]` : select everything.
Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer
are scalars, the element itself is returned. In all other cases a dios is returned.
For more pandas-like indexing magic and the differences between the indexers,
see the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html).
>**Note:**
>
>In contrast to pandas.DataFrame, `.loc[:]` and `.loc[:, :]` always behaves identical. Same apply for `iloc` and
>[`aloc`](#the-special-indexer-aloc). For example, two pandas.DataFrames `df1` and `df2` with different columns,
>does align columns with `df1.loc[:, :] = df2` , but does **not** with `df1.loc[:] = df2`.
>
>If this is the desired behavior or a bug, i couldn't verify so far. -- Bert Palm
**2D-indexer**
`dios[boolean dios-like]` (as single key) - dios accept boolean 2D-indexer (boolean pd.Dataframe
`dios[boolean dios-like]` (as single key) - dios accept boolean 2D-indexer (boolean pandas.Dataframe
or boolean Dios).
Columns and rows from the indexer align with the dios.
......@@ -60,8 +67,10 @@ missing indices and present indices, but False values.
Values from unselected rows and columns are dropped, but empty columns are still preserved,
with the effect that the resulting Dios always have the same column dimension than the initial dios.
This is the exact similar behavior to pd.DataFrame's handling of 2D-indexer, despite that pd.DataFrame
fill np.nans at missing locations and therefore also fill-up, whole missing columns with nans.
>**Note:**
>This is the exact same behavior like pandas.DataFrame's handling of 2D-indexer, despite that pandas.DataFrame
>fill numpy.nan's at missing locations and therefore also fill-up, whole missing columns with numpy.nan's.
**setting values**
......@@ -69,15 +78,14 @@ Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works lik
With `.at`/`.iat` only single items can be set, for the other the
right hand side values can be:
- *scalars*: these are broadcasted to the selected positions
- *nested lists*: the length of the outer list must match the number of indexed columns,
the lengths of the inner lists must match the number of selected rows.
- *lists*: the length the list must match the number of indexed columns. The items can be everything that
can applied to a series, with the respective indexing method (`loc`, `iloc`, `[]`).
- *dios*: the length of the columns must match the number of indexed columns - columns does *not* align,
they are just iterated.
Rows do align. Rows that are present on the right but not on the left are ignored.
Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
on the right, are filled with `NaN`s, like in pandas.
- *normal lists* : column keys must be a scalar(!), the list is passed down, and set with `loc`, `iloc` or `[]` by pandas Series.
- *pd.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]`
- *pandas.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]`
by pandas Series, where it maybe align, depending on the method.
**Examples:**
......@@ -99,19 +107,29 @@ For more information and examples see the [aloc usage](/docs/aloc_usage.md) and
Properties
----------
See also the [Properties documentation](/docs/methods_and_properties.md#properties)
>**Note:**
>
> Properties that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.
- columns
- indexes (series of indexes of all series's)
- lengths (series of lengths of all series's)
- values (not fully pd-like - np.array of series's values)
- indexes
- lengths
- values
- dtypes
- itype (see section Itype)
- itype
- empty
- size
Methods and implied features
-------
Work mostly like analogous methods from pd.DataFrame.
See also the [Methods documentation](/docs/methods_and_properties.md#methods)
>**Note:**
>
> Methods that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.
- `copy()`
- `copy_empty()`
- `all()`
......@@ -131,7 +149,6 @@ Work mostly like analogous methods from pd.DataFrame.
- `is`
- `len(Dios)`
Operators and Comparators
---------
- arithmetical: `+ - * ** // / %` and `abs()`
......
......@@ -29,7 +29,7 @@ So maybe a first example gives an rough idea:
1 66 | 3 77 | 1 88 | 2 99 |
>> d.aloc[[1,2], ['a', 'b', 'd']]
>> d.aloc[[1,2], ['a', 'b', 'd', 'x']]
a | b | d |
===== | ===== | ===== |
1 66 | 2 77 | 1 99 |
......@@ -43,7 +43,9 @@ Unlike the other two indexer methods `loc` and `iloc`, it is not possible to get
the return type is either a pandas.Series, iff the column-indexer is a single key (eg. `'a'`) or a dios, iff not.
The row-indexer does not play any role in the return type choice.
*Note for the curios: This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.*
> **Note for the curios:**
>
> This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.
Indexer types
-------------
......@@ -194,10 +196,9 @@ A easy way to select all columns, is, to use null-**slice**es, like `.aloc[:,:]`
This is just like one would do, with `loc` or `iloc`. Of course slicing with boundaries also work,
eg `.loc[:, 'a':'f']`.
For more information about boolean or slice indexing see the pandas documentation
[Slicing ranges](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges)
and
[Boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)
>**See also**
> - [pandas slicing ranges](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges)
> - [pandas boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)
Selecting Rows a smart way
......@@ -299,7 +300,7 @@ As seen in the example above the series' values are ignored completely. The func
is similar to `s1.loc[s2.index]`, with `s1` and `s2` are pandas.Series's, and s2 is the indexer and s1 is one column
after the other.
If the indexer series holds boolean values they are not ignored.
If the indexer series holds boolean values, these are **not** ignored.
The series align the same way as explained above, but additional only the `True` values are evaluated.
Thus `False`-values are treated like missing indices. The behavior here is analogous to `s1.loc[s2[s2].index]`.
......@@ -338,9 +339,12 @@ nicely with writing those as one-liner:
4 28 | 4 7 | 4 7 | no data |
```
Nevertheless, something like `d.aloc[d['a'] > d['b']]` do not work, because the comparison fails,
as long as the two series objects not have the same index. But maybe one want to checkout
[DictOfSeries.index_of()](/docs/methods_and_properties.md#diosdictofseriesindex_of).
>**Note:**
>
>Nevertheless, something like `d.aloc[d['a'] > d['b']]` do not work, because the comparison fails,
>as long as the two series objects not have the same index. But maybe one want to checkout
>[DictOfSeries.index_of()](/docs/methods_and_properties.md#diosdictofseriesindex_of).
Nested-lists as row indexer
......
......@@ -143,9 +143,9 @@ are defined on pandas.Series to multiple columns.
- Result of applying func along the given axis of the DataFrame.
**See also**
[pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)
>**See also:**
>
>[pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)
**Examples**
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment