Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
dios
Manage
Activity
Members
Labels
Plan
Issues
11
Issue boards
Milestones
Wiki
Jira
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
RDM
dios
Commits
13fefe03
Commit
13fefe03
authored
5 years ago
by
Bert Palm
🎇
Browse files
Options
Downloads
Patches
Plain Diff
documentation, added Notes :D
parent
40142135
No related branches found
No related tags found
No related merge requests found
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
Readme.md
+41
-24
41 additions, 24 deletions
Readme.md
docs/aloc_usage.md
+14
-10
14 additions, 10 deletions
docs/aloc_usage.md
docs/methods_and_properties.md
+3
-3
3 additions, 3 deletions
docs/methods_and_properties.md
with
58 additions
and
37 deletions
Readme.md
+
41
−
24
View file @
13fefe03
DictOfSeries (soon renamed
to SoS?
)
DictOfSeries (
may
soon renamed)
===================================
Is a p
d
.Series of p
d
.Series object which aims to behave as similar as possible to
the
pandas
DataFrame.
Is a p
andas
.Series of p
andas
.Series object
s
which aims to behave as similar as possible to pandas
.
DataFrame.
Nomenclature
------------
-
pd: pandas
-
series/ser: instance of pd.Series
-
dios: instance of DictOfSeries
-
df: instance of pd.DataFrame
-
series/ser: instance of pandas.Series
-
dios: instance of dios.DictOfSeries
-
df: instance of pandas.DataFrame
-
dios-like: a
*dios*
or a
*df*
-
alignable object: a
*dios*
,
*df*
or a
*series*
...
...
@@ -17,8 +16,8 @@ Nomenclature
Features
--------
*
every
*column*
has its own index
*
uses much less memory than a misaligned p
d
.DataFrame
*
behaves quite like a p
d
.DataFrame
*
uses much less memory than a misaligned p
andas
.DataFrame
*
behaves quite like a p
andas
.DataFrame
*
additional align locator (
`.aloc[]`
)
...
...
@@ -27,7 +26,7 @@ Pandas-like indexing
--------------------
`[]`
and
`.loc[]`
,
`.iloc[]`
and
`.at[]`
,
`.iat[]`
- should behave exactly like
their counter-parts from p
d
.DataFrame. They can take as indexer
their counter-parts from p
andas
.DataFrame. They can take as indexer
-
lists, array-like objects and in general all iterables
-
boolean lists and iterables
-
slices
...
...
@@ -41,16 +40,24 @@ can be omitted and will default to `slice(None)`. Examples:
-
`di.iloc[[1,2,3], [0,3]]`
: select positions 1,2,3 from the columns 0 and 3
-
`di.loc[:, 'a':'c']`
: select all rows from columns a to d
-
`di.at[4,'c']`
: select the elements with label 4 in column c
-
`di.loc[:]`
->
`di.loc[:,:]`
: select everything
-
`di.loc[:]`
->
`di.loc[:,:]`
: select everything
.
Scalar indexing always return a pandas Series if the other indexer is a non-scalar. If both indexer
are scalars, the element itself is returned. In all other cases a dios is returned.
For more pandas-like indexing magic and the differences between the indexers,
see the
[
pandas documentation
](
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
)
.
>**Note:**
>
>In contrast to pandas.DataFrame, `.loc[:]` and `.loc[:, :]` always behaves identical. Same apply for `iloc` and
>[`aloc`](#the-special-indexer-aloc). For example, two pandas.DataFrames `df1` and `df2` with different columns,
>does align columns with `df1.loc[:, :] = df2` , but does **not** with `df1.loc[:] = df2`.
>
>If this is the desired behavior or a bug, i couldn't verify so far. -- Bert Palm
**2D-indexer**
`dios[boolean dios-like]`
(as single key) - dios accept boolean 2D-indexer (boolean p
d
.Dataframe
`dios[boolean dios-like]`
(as single key) - dios accept boolean 2D-indexer (boolean p
andas
.Dataframe
or boolean Dios).
Columns and rows from the indexer align with the dios.
...
...
@@ -60,8 +67,10 @@ missing indices and present indices, but False values.
Values from unselected rows and columns are dropped, but empty columns are still preserved,
with the effect that the resulting Dios always have the same column dimension than the initial dios.
This is the exact similar behavior to pd.DataFrame's handling of 2D-indexer, despite that pd.DataFrame
fill np.nans at missing locations and therefore also fill-up, whole missing columns with nans.
>**Note:**
>This is the exact same behavior like pandas.DataFrame's handling of 2D-indexer, despite that pandas.DataFrame
>fill numpy.nan's at missing locations and therefore also fill-up, whole missing columns with numpy.nan's.
**setting values**
...
...
@@ -69,15 +78,14 @@ Setting values with `[]` and `.loc[]`, `.iloc[]` and `.at[]`, `.iat[]` works lik
With
`.at`
/
`.iat`
only single items can be set, for the other the
right hand side values can be:
-
*scalars*
: these are broadcasted to the selected positions
-
*
nested
lists*
: the length
of the outer
list must match the number of indexed columns
,
the lengths of the inner lists must match the number of selected rows
.
-
*lists*
: the length
the
list must match the number of indexed columns
. The items can be everything that
can applied to a series, with the respective indexing method (
`loc`
,
`iloc`
,
`[]`
)
.
-
*dios*
: the length of the columns must match the number of indexed columns - columns does
*not*
align,
they are just iterated.
Rows do align. Rows that are present on the right but not on the left are ignored.
Rows that are present on the left (bear in mind: these rows was explicitly chosen for write!), but not present
on the right, are filled with
`NaN`
s, like in pandas.
-
*normal lists*
: column keys must be a scalar(!), the list is passed down, and set with
`loc`
,
`iloc`
or
`[]`
by pandas Series.
-
*pd.Series*
: column indexer must be a scalar(!), the series is passed down, and set with
`loc`
,
`iloc`
or
`[]`
-
*pandas.Series*
: column indexer must be a scalar(!), the series is passed down, and set with
`loc`
,
`iloc`
or
`[]`
by pandas Series, where it maybe align, depending on the method.
**Examples:**
...
...
@@ -99,19 +107,29 @@ For more information and examples see the [aloc usage](/docs/aloc_usage.md) and
Properties
----------
See also the
[
Properties documentation
](
/docs/methods_and_properties.md#properties
)
>**Note:**
>
> Properties that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.
-
columns
-
indexes
(series of indexes of all series's)
-
lengths
(series of lengths of all series's)
-
values
(not fully pd-like - np.array of series's values)
-
indexes
-
lengths
-
values
-
dtypes
-
itype
(see section Itype)
-
itype
-
empty
-
size
Methods and implied features
-------
Work mostly like analogous methods from pd.DataFrame.
See also the
[
Methods documentation
](
/docs/methods_and_properties.md#methods
)
>**Note:**
>
> Methods that are also implemented in pandas.DataFrame, mostly work analogous in dios.DictOfSeries.
-
`copy()`
-
`copy_empty()`
-
`all()`
...
...
@@ -131,7 +149,6 @@ Work mostly like analogous methods from pd.DataFrame.
-
`is`
-
`len(Dios)`
Operators and Comparators
---------
-
arithmetical:
`+ - * ** // / %`
and
`abs()`
...
...
This diff is collapsed.
Click to expand it.
docs/aloc_usage.md
+
14
−
10
View file @
13fefe03
...
...
@@ -29,7 +29,7 @@ So maybe a first example gives an rough idea:
1 66 | 3 77 | 1 88 | 2 99 |
>> d.aloc[[1,2], ['a', 'b', 'd']]
>> d.aloc[[1,2], ['a', 'b', 'd'
, 'x'
]]
a | b | d |
===== | ===== | ===== |
1 66 | 2 77 | 1 99 |
...
...
@@ -43,7 +43,9 @@ Unlike the other two indexer methods `loc` and `iloc`, it is not possible to get
the return type is either a pandas.Series, iff the column-indexer is a single key (eg.
`'a'`
) or a dios, iff not.
The row-indexer does not play any role in the return type choice.
*Note for the curios: This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.*
> **Note for the curios:**
>
> This is because a scalar (`.aloc[key]`) is translates to `.loc[key:key]` under the hood.
Indexer types
-------------
...
...
@@ -194,10 +196,9 @@ A easy way to select all columns, is, to use null-**slice**es, like `.aloc[:,:]`
This is just like one would do, with
`loc`
or
`iloc`
. Of course slicing with boundaries also work,
eg
`.loc[:, 'a':'f']`
.
For more information about boolean or slice indexing see the pandas documentation
[
Slicing ranges
](
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges
)
and
[
Boolean indexing
](
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing
)
>**See also**
> - [pandas slicing ranges](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges)
> - [pandas boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)
Selecting Rows a smart way
...
...
@@ -299,7 +300,7 @@ As seen in the example above the series' values are ignored completely. The func
is similar to
`s1.loc[s2.index]`
, with
`s1`
and
`s2`
are pandas.Series's, and s2 is the indexer and s1 is one column
after the other.
If the indexer series holds boolean values the
y
are not ignored.
If the indexer series holds boolean values
,
the
se
are
**
not
**
ignored.
The series align the same way as explained above, but additional only the
`True`
values are evaluated.
Thus
`False`
-values are treated like missing indices. The behavior here is analogous to
`s1.loc[s2[s2].index]`
.
...
...
@@ -338,9 +339,12 @@ nicely with writing those as one-liner:
4 28 | 4 7 | 4 7 | no data |
```
Nevertheless, something like
`d.aloc[d['a'] > d['b']]`
do not work, because the comparison fails,
as long as the two series objects not have the same index. But maybe one want to checkout
[
DictOfSeries.index_of()
](
/docs/methods_and_properties.md#diosdictofseriesindex_of
)
.
>**Note:**
>
>Nevertheless, something like `d.aloc[d['a'] > d['b']]` do not work, because the comparison fails,
>as long as the two series objects not have the same index. But maybe one want to checkout
>[DictOfSeries.index_of()](/docs/methods_and_properties.md#diosdictofseriesindex_of).
Nested-lists as row indexer
...
...
This diff is collapsed.
Click to expand it.
docs/methods_and_properties.md
+
3
−
3
View file @
13fefe03
...
...
@@ -143,9 +143,9 @@ are defined on pandas.Series to multiple columns.
-
Result of applying func along the given axis of the DataFrame.
**See also**
[
pandas.DataFrame.apply
](
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
)
>
**See also
:
**
>
>
[pandas.DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)
**Examples**
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment