Skip to content
Snippets Groups Projects
user avatar
authored

FAQ

How can I get everything up and running on Windows?

  1. Get and install Anaconda.
  2. Get and install git.
  3. Clone the relevant repoistories, to do that, open up a Git Bash Window and run:
    1. Clone the data_progs repository with:
      git clone https://git.ufz.de/chs/data_progs data_progs
    2. Clone the CHS python library
      git clone https://git.ufz.de/schaefed/python python_chs_lib
  4. Open up an Anaconda Prompt and create a new environment with:
    conda create --name data_progs --file data_progs\conda-requirements.txt
  5. Activate your new environment (remember to run this command, whenever you open a new Anaconda Prompt)
    conda activate data_progs
  6. Install the CHS python library into your new environment
    cd python_chs_lib
    python setup.py install
    cd ..

How can I access the data?

The data_progs need to have access to the main data directory on the UFZ network share, commonly refered to Y). On official UFZ Windows PC this share is usually already available under Y:\Gruppen\chs-data and there is nothing more to do. Permission will be granted by https://www.ufz.de/index.php?en=38091.

For other operating systems or in case the above is not true, you need to mount the network share to a location of your choice (refer to your usual OS documentation ressources on how to do that), the respective uris can be found here here.

If you include the Gruppen/chs-data share under a different location (definetively the cas on Linux and MacOS) you need to point the data_progs to the directory. In order to do that, adjust the variable ROOT in config/uri.py

How do I run any of the programs in data_progs?

All runnable programs in data_progs start with the prefix do_ (if you plan to add new programs, please stick to this convention) and are implemented as command line applications. In order to run a program in data_progs you need to:

  1. Open up an Anacond Prompt
  2. Activate your data_progs environment.
  3. Change into the base diretory of the data_progs
  4. And use the following command structure: python -m folder.program_name_without_suffix.

A few things are important here:

  • You need to run this command from the base directory, if you setup your environment using this FAQ this would be a directory called data_progs
  • You need to pass the -m flag to the Python interpreter.
  • You have to run o program by its fully qualified name including the directory and the program name, separated by . (a single dot) and without the .py suffix

Example: To run the program transfer_level0_level1/do_level1_flagging.py you need to type the following into your Anaconda Prompt:

python -m transfer_level0_level1.do_level1_flagging

The above commands are annoying, is there no shortcut?

There is. In case you drop yout program into the data_progs base directory you are able to run it with the usual python program.py command. For the testing and development this is perfectly fine, but I won't accept merge requests, littering the repository due to reasons of programmer convenience.

What's up with these stations and devices

Station and Device are the two main abstractions provided by data_progs. The idea here is, to hide the details of station-logger setups like naming schemes, directory structure, access to metadata, etc. behind a single interface providing accessors to all relevant information.

As the names suggest, the object Station is intended to provide access to station (e.g. 'Hohes Holz', 'Grosses Bruch') related data, and a Device is an abstraction on a Data Logger (e.g, 'BC1', 'W2') or Soilnet Box (e.g. 'Box04').

The basic configuration of Station and Device is done through the files config/stations.csv and config/devices.csv

How do I read the Data?

Access to raw (level 0) and quality checked data (level 1) is provided through the devices

from lib.facces import getDevice

device = getDevice(station="HH", logger="BC1")
levell0 = device.getL0Data()
level1 = device.getL1Data()

If we need to process more than one device, the folowwing iterator might be handy:

from lib.facces import getDevices

devices = getDevice(station="HH")
for device in devices:
    data = device.getL1Data()

Can I get a certain time range of the data, please?

Sure. There a multiple ways to accomplish this:

  1. The recommanded way is to pass a start_date and/or and end_date to getDevice/getDevices. That way the data_progs make sure to only give you the relevant temperoral clips of associated data as well (e.g. meta-/configuration data):
    from lib.faccess import getDevice
    
    device = getDevice(station="HH", logger="BC1", start_date="2016-01-01", end_date="2017-12-31")
    data = device.getL1Data()
  2. Pass start_date and/or and end_date to the data accessor methods of Device
    from lib.faccess import getDevice
    
    device = getDevice(station="HH", logger="BC1")
    data = device.getL1Data(start_date="2016-01-01", end_date="2017-12-31")
  3. Use pandas to do that:
    from lib.faccess import getDevice
    
    device = getDevice(station="HH", logger="BC1")
    data = device.getL1Data()
    data.loc[start_date:end_date]

Data and flags are intermingled, can I separate them?

from lib.daccess import splitTable
data, flags = splitTable(data)

Now, data and flags are sparated, can I join them again?

Sure, you can, but there is currently no implementation provided (even if it should be). I guess you can achive that with something in the lines of:

import pandas as pa
from lib.daccess import splitTable, reindexTable

data, flags = splitTable(data)
merged = pd.concat([reindexTable(data), reindexTable(flags)], axis=1).sort_index(axis="columns")

Can I access the metdata excel file aka 'CHS-measurements'?

config = device.readExcel()

I need the manual flags, how can I get them?

manflags = device.getManualFlags()

This repository is large and I am lost, can I get some guidance to the folder structure?

  • transfer_level0_level1: Contains programs to lift data from level0 (raw) to level1 (qualiy checked). Most of the programs are reasonably well maintained as the run daily.
  • level1: Contains programs, that need quality checked level1 data as input. There are many legacy programs from a bunch of different authors, that haven't been used in years and which will most likely not work.
  • lib: All the tooling is located here. Notable modules are:
    • devices: implemenations of Device and its subclasses
    • station: implemetation of Station
    • faccess: here you find the factories for Stations and Devices
    • daccess: function to help you in data handling and a bunch of IO stuff, that really should be moved into a separate module
    • flagging: helper function to make use and sense of the generated quality flags