mask() vs select()
i just stumbled upon the saqc function mask(). It works the opposite as the word mask suggest, in other words its selects data instead of masking it. See also pd.Series.mask(), which mask the data with NaNs and pd.Series.where() which do the opposite.
so i suggest to rename mask() to select select() and write a mask-function which do the actual masking..
i guess the confusion come from the following workflow:
mask = df > 42
df[mask] = 'new_value'
but this is only a quite common workflow for __getitem__ and __setitem__, because its easier to handle (and understand) to write something like df[df>42] = 0 instead of df[~(df>42)] = 0. In fact the [] inverts the mask before applying it or in other word it do a where() and not a mask(). It gets clearer with a example:
>>> s = pd.Series([1,2,3])
>>> cond = s == 2
>>> cond
0 False
1 True
2 False
dtype: bool
>>> s.mask(cond) # as expected
0 1.0
1 NaN
2 3.0
dtype: float64
>>> s.where(cond)
0 NaN
1 2.0
2 NaN
dtype: float64
>>> s[cond]
1 2
dtype: int64
>>> s[cond] = 99
>>> s
0 1
1 99
2 3
dtype: int64
but if we call cond a mask its gets confusing, because s[mask] actually do a selection or an inverted masking.