FAQ
How can I get everything up and running on Windows?
- Get and install Anaconda.
- Get and install git.
- Clone the relevant repoistories, to do that, open up a
Git Bash
Window and run:- Clone the
data_progs
repository with:git clone https://git.ufz.de/chs/data_progs data_progs
- Clone the
CHS python library
git clone https://git.ufz.de/schaefed/python python_chs_lib
- Clone the
- Open up an
Anaconda Prompt
and create a new environment with:conda create --name data_progs --file data_progs\conda-requirements.txt
- Activate your new environment (remember to run this command, whenever you open a new
Anaconda Prompt
)conda activate data_progs
- Install the
CHS python library
into your new environmentcd python_chs_lib python setup.py install cd ..
How can I access the data?
The data_progs
need to have access to the main data directory on the UFZ network share, commonly refered to Y
). On official UFZ Windows PC this share is usually already available under Y:\Gruppen\chs-data
and there is nothing more to do. Permission will be granted by https://www.ufz.de/index.php?en=38091.
For other operating systems or in case the above is not true, you need to mount the network share to a location of your choice (refer to your usual OS documentation ressources on how to do that), the respective uris can be found here here.
If you include the Gruppen/chs-data
share under a different location (definetively the cas on Linux and MacOS) you need to point the data_progs
to the directory. In order to do that, adjust the variable ROOT
in config/uri.py
data_progs
?
How do I run any of the programs in All runnable programs in data_progs
start with the prefix do_
(if you plan to add new programs, please stick to this convention) and are implemented as command line applications. In order to run a program in data_progs
you need to:
- Open up an
Anacond Prompt
- Activate your
data_progs
environment. - Change into the base diretory of the
data_progs
- And use the following command structure:
python -m folder.program_name_without_suffix
.
A few things are important here:
- You need to run this command from the base directory, if you setup your environment using this FAQ this would be a directory called
data_progs
- You need to pass the
-m
flag to the Python interpreter. - You have to run o program by its fully qualified name including the directory and the program name, separated by
.
(a single dot) and without the.py
suffix
Example: To run the program transfer_level0_level1/do_level1_flagging.py
you need to type the following into your Anaconda Prompt
:
python -m transfer_level0_level1.do_level1_flagging
The above commands are annoying, is there no shortcut?
There is. In case you drop yout program into the data_progs
base directory you are able to run it with the usual python program.py
command. For the testing and development this is perfectly fine, but I won't accept merge requests, littering the repository due to reasons of programmer convenience.
station
s and device
s
What's up with these Station
and Device
are the two main abstractions provided by data_progs
. The idea here is, to hide the details of station-logger setups like naming schemes, directory structure, access to metadata, etc. behind a single interface providing accessors to all relevant information.
As the names suggest, the object Station
is intended to provide access to station (e.g. 'Hohes Holz', 'Grosses Bruch') related data, and a Device
is an abstraction on a Data Logger (e.g, 'BC1', 'W2') or Soilnet Box (e.g. 'Box04').
The basic configuration of Station
and Device
is done through the files config/stations.csv
and config/devices.csv
How do I read the Data?
Access to raw (level 0) and quality checked data (level 1) is provided through the device
s
from lib.facces import getDevice
device = getDevice(station="HH", logger="BC1")
levell0 = device.getL0Data()
level1 = device.getL1Data()
If we need to process more than one device, the folowwing iterator might be handy:
from lib.facces import getDevices
devices = getDevice(station="HH")
for device in devices:
data = device.getL1Data()
Can I get a certain time range of the data, please?
Sure. There a multiple ways to accomplish this:
- The recommanded way is to pass a
start_date
and/or andend_date
togetDevice
/getDevices
. That way thedata_progs
make sure to only give you the relevant temperoral clips of associated data as well (e.g. meta-/configuration data):from lib.faccess import getDevice device = getDevice(station="HH", logger="BC1", start_date="2016-01-01", end_date="2017-12-31") data = device.getL1Data()
- Pass
start_date
and/or andend_date
to the data accessor methods ofDevice
from lib.faccess import getDevice device = getDevice(station="HH", logger="BC1") data = device.getL1Data(start_date="2016-01-01", end_date="2017-12-31")
- Use
pandas
to do that:from lib.faccess import getDevice device = getDevice(station="HH", logger="BC1") data = device.getL1Data() data.loc[start_date:end_date]
Data and flags are intermingled, can I separate them?
from lib.daccess import splitTable
data, flags = splitTable(data)
Now, data and flags are sparated, can I join them again?
Sure, you can, but there is currently no implementation provided (even if it should be). I guess you can achive that with something in the lines of:
import pandas as pa
from lib.daccess import splitTable, reindexTable
data, flags = splitTable(data)
merged = pd.concat([reindexTable(data), reindexTable(flags)], axis=1).sort_index(axis="columns")
Can I access the metdata excel file aka 'CHS-measurements'?
config = device.readExcel()
I need the manual flags, how can I get them?
manflags = device.getManualFlags()
This repository is large and I am lost, can I get some guidance to the folder structure?
-
transfer_level0_level1
: Contains programs to lift data from level0 (raw) to level1 (qualiy checked). Most of the programs are reasonably well maintained as the run daily. -
level1
: Contains programs, that need quality checked level1 data as input. There are many legacy programs from a bunch of different authors, that haven't been used in years and which will most likely not work. -
lib
: All the tooling is located here. Notable modules are:-
devices
: implemenations ofDevice
and its subclasses -
station
: implemetation ofStation
-
faccess
: here you find the factories forStation
s andDevice
s -
daccess
: function to help you in data handling and a bunch of IO stuff, that really should be moved into a separate module -
flagging
: helper function to make use and sense of the generated quality flags
-