diff --git a/Readme.md b/Readme.md index b4c1c0ef3c3d4e383abf0196d8c69c6b540705e6..0d1592dd6a701cb117400a4b76692cdda2e22960 100644 --- a/Readme.md +++ b/Readme.md @@ -20,6 +20,27 @@ Features * behaves quite like a pandas.DataFrame * additional align locator (`.aloc[]`) +Install +------- + +todo: PyPi + +``` +import dios + +# Have fun :) +``` + +Documentation +------------- + +todo: link to ReadTheDocs + +Local docs about: +* [Indexing](/docs/doc_indexing.md) +* [Cookbook](/docs/doc_cookbook.md) +* [Itype](/docs/doc_itype.md) + TL;DR ----- **get it** @@ -77,157 +98,3 @@ Columns: ['x', 'y'] 2 spam | | ``` -Pandas-like indexing --------------------- - -`[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` - should behave exactly like -their counter-parts from pandas.DataFrame. They can take as indexer -- lists, array-like objects and in general all iterables -- boolean lists and iterables -- slices -- scalars and any hashable object - -Most indexers are directly passed to the underling columns-series or row-series depending -on the position of the indexer and the complexity of the operation. For `.loc`, `.iloc`, `.at` -and `iat` the first position is the *row indexer*, the second the *column indexer*. The second -can be omitted and will default to `slice(None)`. Examples: -- `di.loc[[1,2,3], ['a']]` : select labels 1,2,3 from column a -- `di.iloc[[1,2,3], [0,3]]` : select positions 1,2,3 from the columns 0 and 3 -- `di.loc[:, 'a':'c']` : select all rows from columns a to d -- `di.at[4,'c']` : select the elements with label 4 in column c -- `di.loc[:]` -> `di.loc[:,:]` : select everything. - -Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer -are scalars, the element itself is returned. In all other cases a dios is returned. -For more pandas-like indexing magic and the differences between the indexers, -see the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html). - ->**Note:** -> ->In contrast to pandas.DataFrame, `.loc[:]` and `.loc[:, :]` always behaves identical. Same apply for `iloc` and ->[`aloc`](#the-special-indexer-aloc). For example, two pandas.DataFrames `df1` and `df2` with different columns, ->does align columns with `df1.loc[:, :] = df2` , but does **not** with `df1.loc[:] = df2`. -> ->If this is the desired behavior or a bug, i couldn't verify so far. -- Bert Palm - -**2D-indexer** - -`dios[boolean dios-like]` (as single key) - dios accept boolean 2D-indexer (boolean pandas.Dataframe -or boolean Dios). - -Columns and rows from the indexer align with the dios. -This means that only matching columns selected and in this columns rows are selected where -i) indices are match and ii) the value is True in the indexer-bool-dios. There is no difference between -missing indices and present indices, but False values. - -Values from unselected rows and columns are dropped, but empty columns are still preserved, -with the effect that the resulting Dios always have the same column dimension than the initial dios. - ->**Note:** ->This is the exact same behavior like pandas.DataFrame's handling of 2D-indexer, despite that pandas.DataFrame ->fill numpy.nan's at missing locations and therefore also fill-up, whole missing columns with numpy.nan's. - -**setting values** - -Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works like in pandas. -With `.at`/`.iat` only single items can be set, for the other the -right hand side values can be: - - *scalars*: these are broadcasted to the selected positions - - *lists*: the length the list must match the number of indexed columns. The items can be everything that - can applied to a series, with the respective indexing method (`loc`, `iloc`, `[]`). - - *dios*: the length of the columns must match the number of indexed columns - columns does *not* align, - they are just iterated. - Rows do align. Rows that are present on the right but not on the left are ignored. - Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present - on the right, are filled with `NaN`s, like in pandas. - - *pandas.Series*: column indexer must be a scalar(!), the series is passed down, and set with `loc`, `iloc` or `[]` - by pandas Series, where it maybe align, depending on the method. - -**Examples:** - -- `dios.loc[2:5, 'a'] = [1,2,3]` is the same as `a=dios['a']; a.loc[2:5]=[1,2,3]; dios['a']=a` -- `dios.loc[2:5, :] = 99` : set 99 on rows 2 to 5 on all columns - -The special indexer `.aloc` ------------------------------ - -Additional to the pandas like indexers we have a `.aloc[..]` (align locator) indexing method. -Unlike `.iloc` and `.loc` indexers fully align if possible and 1D-array-likes can be broadcast -to multiple columns at once. This method also handle missing indexer-items gracefully. -It is used like `.loc`, so a single indexer (`.aloc[indexer]`) or a tuple of row-indexer and -column-indexer (`.aloc[row-indexer, column-indexer]`) can be given. Also it can handle boolean and *non-bolean* -2D-Indexer. - -For more information and examples see the [aloc usage](/docs/aloc_usage.md) and the [cookbook](docs/cookbook.md). - -Properties ----------- -See also the [Properties documentation](/docs/methods_and_properties.md#properties) - ->**Note:** -> -> Properties that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries. - -- columns -- indexes -- lengths -- values -- dtypes -- itype -- empty -- size - -Methods and implied features -------- -See also the [Methods documentation](/docs/methods_and_properties.md#methods) - ->**Note:** -> -> Methods that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries. - -- `copy()` -- `copy_empty()` -- `all()` -- `any()` -- `squeeze()` -- `to_df()` -- `to_string()` -- `apply()` -- `astype()` -- `isin()` -- `isna()` -- `notna()` -- `dropna()` -- `memory_usage()` -- `index_of()` -- `in` -- `is` -- `len(Dios)` - -Operators and Comparators ---------- -- arithmetical: `+ - * ** // / %` and `abs()` -- boolean: `&^|~` -- comparators: `== != > >= < <=` - -Itype ------ -DictOfSeries holds multiple series, and each series can have a different index length -and index type. Differing index lengths are either solved by some aligning magic, or simply fail, if -aligning makes no sense (eg. assigning the very same list to series of different lengths (see `.aloc`). - -A bigger challange is the type of the index. If one series has an alphabetical index, and another one -a numeric index, selecting along columns can fail in every scenario. To keep track of the -types of index or to prohibit the inserting of a *not fitting* index type, -we introduce the `itype`. This can be set on creation of a Dios and also changed during usage. -On change of the itype, all indexes of all series in the dios are casted to a new fitting type, -if possible. Different cast-mechanisms are available. - -If an itype prohibits some certain types of indexes and a series with a non-fitting index-type is inserted, -an implicit type cast is done (with or without a warning) or an error is raised. The warning/error policy -can be adjusted via global options. - -Have fun :) - - - diff --git a/docs/doc_itype.md b/docs/doc_itype.md index 070f205d6fa02d2d0b1c52a90b5914a527cb525d..ea1d49f733b173a4799d9406f61a9932280d58fa 100644 --- a/docs/doc_itype.md +++ b/docs/doc_itype.md @@ -1,2 +1,18 @@ Itype -===== \ No newline at end of file +===== + +DictOfSeries holds multiple series, and each series can have a different index length +and index type. Differing index lengths are either solved by some aligning magic, or simply fail, if +aligning makes no sense (eg. assigning the very same list to series of different lengths (see `.aloc`). + +A bigger challange is the type of the index. If one series has an alphabetical index, and another one +a numeric index, selecting along columns can fail in every scenario. To keep track of the +types of index or to prohibit the inserting of a *not fitting* index type, +we introduce the `itype`. This can be set on creation of a Dios and also changed during usage. +On change of the itype, all indexes of all series in the dios are casted to a new fitting type, +if possible. Different cast-mechanisms are available. + +If an itype prohibits some certain types of indexes and a series with a non-fitting index-type is inserted, +an implicit type cast is done (with or without a warning) or an error is raised. The warning/error policy +can be adjusted via global options. +