fix data/flags accessor
I understand, that we want saqc
s internal dios structure not to be exposed too much, and that we pass the default getitem
implicit in: SaQC.data[slice(...)]
on to a dataframe, because it is more transparent/uncomplex/stable, than using the dios.
Further more, the user is encouraged to almost always use the .data
accessor, rather than the .data_raw
accessor. (i guess the idea, is, to have data_raw
reserved for "heavier" users and "debuggers", so it has the detering (and imo totally missleading) suffix "_raw" applied...)
However, there is a problem with the implementation of this conception. The thing is, that it is a common, and not a heavy user use case, to just get some processed data out of a saqc object, but not all of it.
The thing is, that by accessing .data['x']
, one gets a slice of the concatenation of all the data in the instance - such a slice is nearly useless, since it contains intracktable nan entries and heavily contrasts the intuition of acessing the data
. (also it gives away all the dios-benefits saqc
has)
So, in effect, one would just use .data_raw['x']
, or .data_raw['x'].to_df()
, to avoid possible inconsistencies - which means, one must have had an understanding of what a dios is and of what .data_raw
means, anyway. I have a hard time, when writing cookbux, having to explain, why i suddenly use .data_raw
, when above i used .data
- because .data
gets useless in really basic cases already.
For example, when harmonizing one of several variables in a saqc object. In a tutorial, i would than say - now lets have a look at the harmonized variable and how regular the timestamps are: but of course: .data['harmed_var']
doesnt give me a regularly sampled timeseries, in fact, it returns a messy timeseries with potentially a majority of totally irregular entries. So, in this really basic situation, one would already have to switch to data_raw
- and, by that, already have to explain, why a harmonized time series, is accessed by something, that is declared raw ...
So, in effect, the conceptual differentiation between data
and data_raw
is broken from the usage perspective and one would almost always use .data_raw
to mitigate errors/confusion/inconsistencies.
I can imagine 2 solutions to repair the concept:
-
Either make it so, that
.data['x','y']
return a concatenation of 'x' and 'y' only, (and not the 'x', 'y' slice from the concatenation of all the data -
Make the
.data_raw
accessor more prominent, by giving it a more qualifying name, that does encourage its usage. Of course, this also means, one hast to talk about dios-saqc interplay more in the documentation, and the whole dios thing is less hidden
I prefer 1.