DictOfSeries¶
-
class
dios.
DictOfSeries
(data=None, columns=None, index=None, itype=None, cast_policy='save', fastpath=False)¶ Bases:
dios.base._DiosBase
A data frame where every column has its own index.
DictOfSeries is a collection of pd.Series’s which aim to be as close as possible similar to pd.DataFrame. The advantage over pd.DataFrame is, that every column has its own row-index, unlike the former, which provide a single row-index for all columns. This solves problems with unaligned data and data which varies widely in length.
Indexing with
di[]
,di.loc[]
anddi.iloc[]
should work analogous to these methods from pd.DataFrame. The indexer can be a single label, a slice, a list-like, a boolean list-like, or a boolean DictOfSeries/pd.DataFrame and can be used to selectively get or set data.- Parameters
data (array-like, Iterable, dict, or scalar value) – Contains data stored in Series.
columns (array-like) – Column labels to use for resulting frame. Will default to RangeIndex(0, 1, 2, …, n) if no column labels are provided.
index (Index or array-like) – Index to use to reindex every given series during init. Ignored if omitted.
itype (Itype, pd.Index, Itype-string-repr or type) – Every series that is inserted, must have an index of this type or any of this types subtypes. If None, the itype is inferred as soon as the first non-empty series is inserted.
cast_policy ({'save', 'force', 'never'}, default 'save') – Policy used for (down-)casting the index of a series if its type does not match the
itype
.
Attributes Summary
Return pd.series with the indexes.
Methods Summary
all
([axis])any
([axis])apply
(func[, axis, raw, args])Apply a function along an axis of the DictOfSeries.
astype
(dtype[, copy, errors])clear
()copy
([deep])copy_empty
([columns])dropna
([inplace])equals
(other)for_each
(attr_or_callable, **kwds)Apply a callable or a pandas.Series method or property on each column.
get
(key[, default])hasnans
([axis, drop_empty])Returns a boolean Series along an axis, which indicates if it contains NA-entries.
index_of
([method])Return an single index with indices from all columns.
isdata
()Alias for
DictOfSeries.notna(drop_empty=True)
.isempty
()Returns a boolean Series, which indicates if an column is empty
isin
(values)isna
([drop_empty])Return a boolean DictOfSeries which indicates NA positions.
isnull
([drop_empty])Alias, see DictOfSeries.isna.
items
()iterrows
([fill_value, squeeze])Iterate over DictOfSeries rows as (index, pandas.Series/DictOfSeries) pairs.
keys
()max
([axis, skipna])memory_usage
([index, deep])min
([axis, skipna])notempty
()Returns a boolean Series, which indicates if an column is not empty
notna
([drop_empty])Return a boolean DictOfSeries which indicates non-NA positions.
notnull
([drop_empty])Alias, see
DictOfSeries.notna
.pop
(*args)popitem
()reduce_columns
(func[, initial, skipna])Reduce all columns to a single pandas.Series by a given function.
setdefault
(key[, default])squeeze
([axis])to_csv
(*args, **kwargs)Write object to a comma-separated values (csv) file.
to_df
()to_string
([max_rows, min_rows, max_cols, …])Pretty print a dios.
update
(other)Attributes Documentation
-
aloc
¶
-
at
¶
-
cast_policy
¶
-
columns
¶
-
debugDf
¶
-
dtypes
¶
-
empty
¶
-
iat
¶
-
iloc
¶
-
indexes
¶ Return pd.series with the indexes.
-
itype
¶
-
lengths
¶
-
loc
¶
-
size
¶
-
values
¶
Methods Documentation
-
all
(axis=0)¶
-
any
(axis=0)¶
-
apply
(func, axis=0, raw=False, args=, **kwds)¶ Apply a function along an axis of the DictOfSeries.
- Parameters
func (callable) – Function to apply on each column.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Axis along which the function is applied:
0 or ‘index’: apply function to each column.
1 or ‘columns’: NOT IMPLEMENTED
raw (bool, default False) –
Determines if row or column is passed as a Series or ndarray object:
False
: passes each row or column as a Series to the function.True
: the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
**kwds – Additional keyword arguments to pass as keywords arguments to func.
- Returns
Result of applying
func
along the given axis of the DataFrame.- Return type
Series or DataFrame
- Raises
NotImplementedError –
if axis is ‘columns’ or 1
See also
DictOfSeries.for_each()
apply pd.Series methods or properties to each column
-
astype
(dtype, copy=True, errors='raise')¶
-
clear
()¶
-
copy
(deep=True)¶
-
copy_empty
(columns=True)¶
-
dropempty
()¶
-
dropna
(inplace=False)¶
-
equals
(other)¶
-
for_each
(attr_or_callable, **kwds)¶ Apply a callable or a pandas.Series method or property on each column.
- Parameters
attr_or_callable (Any) – A pandas.Series attribute or any callable, to apply on each column. A series attribute can be any property, field or method and also could be specified as string. If a callable is given it must take pandas.Series as the only positional argument.
**kwds (any) – kwargs to passed to callable
- Returns
A series with the results, indexed by the column labels.
- Return type
pandas.Series
See also
DictOfSeries.apply()
Apply functions to columns and convert result to DictOfSeries.
Examples
>>> d = DictOfSeries([range(3), range(4)], columns=['a', 'b']) >>> d a | b | ==== | ==== | 0 0 | 0 0 | 1 1 | 1 1 | 2 2 | 2 2 | | 3 3 |
Use with a callable..
>>> d.for_each(max) columns a 2 b 3 dtype: object
..or with a string, denoting a pd.Series attribute and therefor is the same as giving the latter.
>>> d.for_each('max') columns a 2 b 3 dtype: object
>>> d.for_each(pd.Series.max) columns a 2 b 3 dtype: object
Both also works with properties:
>>> d.for_each('dtype') columns a int64 b int64 dtype: object
-
get
(key, default=None)¶
-
hasnans
(axis=0, drop_empty=False)¶ Returns a boolean Series along an axis, which indicates if it contains NA-entries.
-
index_of
(method='all')¶ Return an single index with indices from all columns.
- Parameters
method (string) –
‘all’ : get all indices from all columns
’union’ : alias for ‘all’
’shared’ : get indices that are present in every columns
’intersection’ : alias for ‘shared’
’uniques’ : get indices that are only present in a single column
’non-uniques’ : get indices that are present in more than one column
- Returns
A single duplicate-free index, somehow representing indices of all columns.
- Return type
pd.Index
-
isdata
()¶ Alias for
DictOfSeries.notna(drop_empty=True)
.
-
isempty
()¶ Returns a boolean Series, which indicates if an column is empty
-
isin
(values)¶
-
isna
(drop_empty=False)¶ Return a boolean DictOfSeries which indicates NA positions.
-
isnull
(drop_empty=False)¶ Alias, see DictOfSeries.isna.
-
items
()¶
-
iteritems
()¶
-
iterrows
(fill_value=nan, squeeze=True)¶ Iterate over DictOfSeries rows as (index, pandas.Series/DictOfSeries) pairs. MAY BE VERY PERFORMANCE AND/OR MEMORY EXPENSIVE
- Parameters
fill_value (scalar, default numpy.nan) –
Fill value for row entry, if the column does not have an entry at the current index location. This ensures that the returned Row always contain all columns. If
None
is given no value is filled.If
fill_value=None
andsqueeze=True
the resulting Row (a pandas.Series) may differ in length between iterator calls. That’s because an entry, that is not present in a column, will also not be present in the resulting Row.squeeze (bool, default False) –
True
: A pandas.Series is returned for each row.False
: A single-rowed DictOfSeries is returned for each row.
- Yields
index (label) – The index of the row.
data (Series or DictOfSeries) – The data of the row as a Series if squeeze is True, as a DictOfSeries otherwise.
See also
DataFrame.iteritems()
Iterate over (column name, Series) pairs.
-
keys
()¶
-
max
(axis=None, skipna=None)¶
-
memory_usage
(index=True, deep=False)¶
-
min
(axis=0, skipna=True)¶
-
notempty
()¶ Returns a boolean Series, which indicates if an column is not empty
-
notna
(drop_empty=False)¶ Return a boolean DictOfSeries which indicates non-NA positions.
-
notnull
(drop_empty=False)¶ Alias, see
DictOfSeries.notna
.
-
pop
(*args)¶
-
popitem
()¶
-
reduce_columns
(func, initial=None, skipna=False)¶ Reduce all columns to a single pandas.Series by a given function.
Apply a function of two pandas.Series as arguments, cumulatively to all columns, from left to right, so as to reduce the columns to a single pandas.Series. If initial is present, it is placed before the columns in the calculation, and serves as a default when the columns are empty.
- Parameters
func (function) – The function must take two identically indexed pandas.Series and should return a single pandas.Series with the same index.
initial (column-label or pd.Series, default None) – The series to start with. If None a dummy series is created, with the indices of all columns and the first seen values.
skipna (bool, default False) – If True, skip NaN values.
- Returns
A series with the reducing result and the index of the start series, defined by
initializer
.- Return type
pandas.Series
-
setdefault
(key, default=None)¶
-
squeeze
(axis=None)¶
-
to_csv
(*args, **kwargs)¶ Write object to a comma-separated values (csv) file.
Changed in version 0.24.0: The order of arguments for Series was changed.
- Parameters
path_or_buf (str or file handle, default None) –
File path or object, if None is provided the result is returned as a string. If a file object is passed it should be opened with newline=’’, disabling universal newlines.
Changed in version 0.24.0: Was previously named “path” for Series.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, default None) – Format string for floating point numbers.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) –
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
Changed in version 0.24.0: Previously defaulted to False for Series.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode (str) – Python write mode, default ‘w’.
encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’.
compression (str or dict, default 'infer') –
If str, represents compression mode. If dict, value at ‘method’ is the compression mode. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is ‘zip’ or inferred as ‘zip’, other entries passed as additional compression options.
Changed in version 1.0.0: May now be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
line_terminator (str, optional) –
The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).
Changed in version 0.24.0.
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
- Returns
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
- Return type
None or str
See also
read_csv()
Load a CSV file into a DataFrame.
to_excel()
Write DataFrame to an Excel file.
Examples
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'], ... 'mask': ['red', 'purple'], ... 'weapon': ['sai', 'bo staff']}) >>> df.to_csv(index=False) 'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create ‘out.zip’ containing ‘out.csv’
>>> compression_opts = dict(method='zip', ... archive_name='out.csv') >>> df.to_csv('out.zip', index=False, ... compression=compression_opts)
-
to_df
()¶
-
to_string
(max_rows=None, min_rows=None, max_cols=None, na_rep='NaN', show_dimensions=False, method='indexed', no_value=' ', empty_series_rep='no data', col_delim=' | ', header_delim='=', col_space=None)¶ Pretty print a dios.
- if method == indexed (default):
every column is represented by a own index and corresponding values
- if method == aligned [2]:
one(!) global index is generated and values from a column appear at the corresponding index-location.
- Parameters
max_cols – not more column than max_cols are printed [1]
max_rows – see min_rows [1]
min_rows – not more rows than min_rows are printed, if rows of any series exceed max_rows [1]
na_rep – all NaN-values are replaced by na_rep. Default NaN
empty_series_rep – Ignored if not method=’indexed’. Empty series are represented by the string in empty_series_rep
col_delim (str) – Ignored if not method=’indexed’. between all columns col_delim is inserted.
header_delim – Ignored if not method=’indexed’. between the column names (header) and the data, header_delim is inserted, if not None. The string is repeated, up to the width of the column. (str or None).
no_value – Ignored if not method=’aligned’. value that indicates, that no entry in the underling series is present. Bear in mind that this should differ from na_rep, otherwise you cannot differ missing- from NaN- values.
Notes
[1]: defaults to the corresponding value in dios_options [2]: the common-params are directly passed to pd.DataFrame.to_string(..) under the hood, if method is aligned
-
update
(other)¶