The skOfflineDatabase Library

This document is divided into the following sections:-

Introduction

The skOfflineDatabase package provides access to calibration tables from the offline Database. The primary interface it offers is to return all the rows from a specified table that best satisfy the current "context" (i.e. date and time of the current event). The interface minimises database I/O by caching query results and making them available to its clients as required.

This implementation of the skOfflineDatabase is derived from the ND280 oaOfflineDatabase package, which was in turn derived from the MINOS database package. That code is still fundamentally that which was borrowed from MINOS. This code in turn interfaces to ROOT's mysql package.

The remainder of this document is very narrowly focused on the most urgent need: to get calibration constants into the database and the subsequently access them.

Setting up the Environment.

Before we can use the skOfflineDatabase package we have to set the following 3 environmental variables:-

SK_TSQL_URL  This specifies the MySQL server, the port and the database.
              It takes the form: protocol://host[:port]/[database]
                 where:
                    protocol - mysql
                    host     - host name or IP address of database server
                    port     - port number (optional 3306 is the default)
                    database - name of database
               For now use SK_TSQL_URL=mysql://neut12.triumf.ca/testskcalib

SK_TSQL_USER  The account user name; 

-  for now reading from database use SK_TSQL_USER=skcaldb_reader

-  for now writing to database use SK_TSQL_USER=skcaldb_writer
       
SK_TSQL_PSWD  The account password; 

- for now reading from database use SK_TSQL_PSWD=rdskcaldb

- for now writing to database use SK_TSQL_PSWD=sk222wrtr

The interface can support access to multiple databases by specifying them as a semi-colon separated list (which means they must be enclosed in double quotes). For example:-

export ENV_TSQL_URL="mysql://mylaptop/test_db;mysql://local_server/standard_db;mysql://remote_server/archive_db"

It is called a "cascade" because the interface first attempts to satisfy the query with the first entry (mylaptop/test_db), and if that fails then tries local_server/standard_db and if that too fails tries remote_server/archive_db. This could be useful if you want to try out some new table entries on the test database before updating the main database.

If different account names or passwords are required for different databases in the cascade then ENV_TSQL_USER and or ENV_TSQL_PSWD can be a semi-colon separated list in the same order as ENV_TSQL_URL.

By default users simply want to be able to read from the database can just execute the script setup_reader.[c]sh; this will setup read-access to the test database.

The Use Case: calibration constants

For the purpose of this document we will invent a table of dummy calibration constants. The table will be called DEMO_DB_TABLE.

Each row in this table will contain:-

The electronics channel ID
3 integer parameters
3 floating point parameters

We further require that, having obtained the rows of this table that match the current event date and time, we can look up a row by channel ID.

An example row of such a table could look like this:-

+-------------+---------+---------+---------+---------+---------+---------+
| E_CHAN_ID   | I_PARM1 | I_PARM2 | I_PARM3 | F_PARM1 | F_PARM2 | F_PARM3 |
+-------------+---------+---------+---------+---------+---------+---------+
| 1           |     101 |     201 |     301 |    1.01 |    2.01 |    3.01 |
+-------------+---------+---------+---------+---------+---------+---------+

The Double Table Model

So far we have defined what a row is like, but how do we assign it a validity i.e. for what date range it applies? We could add additional columns to this table to include this information, but typically the same validity applies to many rows of a table so this becomes very wasteful, both in table space and query time so a two table approach is adopted in which the main table has a single extra column SEQNO which is an integer key into a second table (whose name is formed by adding VLD to the end of the table name) where the validity is recorded.

For example:-

Table DEMO_DB_TABLEVLD
+---------+---------------------+---------------------+---------------------+
| SEQNO   | TIMESTART           | TIMEEND             | CREATIONDATE        |
+---------+---------------------+---------------------+---------------------+
|       1 | 2009-01-01 00:00:00 | 2009-02-00 00:00:00 | 2009-04-07 18:00:00 |
|       2 | 2009-01-15 00:00:00 | 2009-02-15 00:00:00 | 2009-04-08 18:00:00 |
+---------+---------------------+---------------------+---------------------+

Table DEMO_DB_TABLE
+---------+-------------+---------+---------+---------+---------+---------+---------+
| SEQNO   | E_CHAN_ID   | I_PARM1 | I_PARM2 | I_PARM3 | F_PARM1 | F_PARM2 | F_PARM3 |
+---------+-------------+---------+---------+---------+---------+---------+---------+
|       1 | 0           |     101 |     201 |     301 |    1.01 |    2.01 |    3.01 |
|       1 | 1           |     102 |     202 |     302 |    1.02 |    2.02 |    3.02 |
|       1 | 2           |     103 |     203 |     303 |    1.03 |    2.03 |    3.03 |
|       1 | 3           |     104 |     204 |     304 |    1.04 |    2.04 |    3.04 |
|       2 | 0           |     105 |     205 |     305 |    1.05 |    2.05 |    3.05 |
|       2 | 1           |     106 |     206 |     306 |    1.06 |    2.06 |    3.06 |
|       2 | 2           |     107 |     207 |     307 |    1.07 |    2.07 |    3.07 |
+---------+-------------+---------+---------+---------+---------+---------+---------+

First notice that the DEMO_DB_TABLEVLD has two rows, the first valid for all of January 2009 and the second valid from the 15th of January to the 15th of February. The first row has 4 corresponding entries (i.e. the same SEQNO = 1) in the DEMO_DB_TABLE table whilst the second has only 3 (perhaps the TFB chip broke?).

Ambiguity Resolution

The next thing to notice is that although no DEMO_DB_TABLEVLD rows satisfies dates before January 1st 2009 or after February 15th, for dates between January 15th and February the 1st both entries do; we have an overlap. Overlaps are an essential part of calibration: you need to have good constants ready to go long before the previous ones are degraded. However, this means that there is an ambiguity that has to be resolved and to achieve that we introduce the CREATIONDATE column which is used to resolve ambiguities in favour of the row with the later creation date. Consequently in our example the validity of the first row is effectively trimmed to end on January 15th. However there is an additional mechanism depending on a value called the EPOCH number which can supersede the CREATIONDATE, see below.

Running the demo

Accompanying this documentation there is a demo program in

  app/demo.cxx

which is built and run as part of the regular make.

To run the demo do

make the libraries and executables
Configure the database "cascade" (see Setting up the Environment.)

i.e. using the script
```
source setup_writer.csh
```
Note that you must have the ROOTSYS environment set beforehand.

Prime the database

app/database_updater.py apply_local_update demo/demo_db_table.update

Run the demo
```
linux/demo.exe 
```

The demo queries the database at 5 day intervals from 2008-12-30 to 2009-02-28 and if running correctly should produce output including:-

% Applying query at 2008-12-30 00:00:00 ... failed to find any results.
% Applying query at 2009-01-04 00:00:00 ... result set contains 4 rows as follows:-
% Applying query at 2009-01-09 00:00:00 ... result set contains 4 rows as follows:-
% Applying query at 2009-01-14 00:00:00 ... result set contains 4 rows as follows:-
% Applying query at 2009-01-19 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-01-24 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-01-29 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-02-03 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-02-08 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-02-13 00:00:00 ... result set contains 3 rows as follows:-
% Applying query at 2009-02-18 00:00:00 ... failed to find any results.
% Applying query at 2009-02-23 00:00:00 ... failed to find any results.
% Applying query at 2009-02-28 00:00:00 ... failed to find any results.

Notice on how after January the 14th it switches to the 3 row result.

The Table Definitions

It's now time to look at how the DEMO_DB_TABLE and DEMO_DB_TABLEVLD tables are defined.

DEMO_DB_TABLE

The DEMO_DB_TABLE is defined as follows:-

  CREATE TABLE DEMO_DB_TABLE(
         SEQNO INTEGER not null,
         ROW_COUNTER INTEGER not null,
         E_CHAN_ID  INT,
         I_PARM1    INT,
         I_PARM2    INT,
         I_PARM3    INT,
         F_PARM1    FLOAT,
         F_PARM2    FLOAT,
         F_PARM3    FLOAT,
         primary key(SEQNO,ROW_COUNTER));

INT is the standard 32-bit integer and FLOAT the standard 32-bit floating point.

Notice:-

         ROW_COUNTER INTEGER not null,
...
         primary key(SEQNO,ROW_COUNTER));

It's good practise to make sure that all database rows are unique, for example it prevents you from adding the same data twice. So ROW_COUNTER just increments for each row with the same SEQNO and the 'primary key' clause just places the constraint that the combination of it and SEQNO is unique. In full our table is:-

+---------+-------------+-------------+---------+---------+---------+---------+---------+---------+
| SEQNO   | ROW_COUNTER | E_CHAN_ID   | I_PARM1 | I_PARM2 | I_PARM3 | F_PARM1 | F_PARM2 | F_PARM3 |
+---------+-------------+-------------+---------+---------+---------+---------+---------+---------+
|       1 |           1 | 0           |     101 |     201 |     301 |    1.01 |    2.01 |    3.01 |
|       1 |           2 | 1           |     102 |     202 |     302 |    1.02 |    2.02 |    3.02 |
|       1 |           3 | 2           |     103 |     203 |     303 |    1.03 |    2.03 |    3.03 |
|       1 |           4 | 3           |     104 |     204 |     304 |    1.04 |    2.04 |    3.04 |
|       2 |           1 | 0           |     105 |     205 |     305 |    1.05 |    2.05 |    3.05 |
|       2 |           2 | 1           |     106 |     206 |     306 |    1.06 |    2.06 |    3.06 |
|       2 |           3 | 2           |     107 |     207 |     307 |    1.07 |    2.07 |    3.07 |
+---------+-------------+-------------+---------+---------+---------+---------+---------+---------+

DEMO_DB_TABLEVLD

Turning to DEMO_DB_TABLEVLD, there is rather more to see:-

  create table DEMO_DB_TABLEVLD (
         SEQNO integer not null primary key,
         TIMESTART datetime not null,
         TIMEEND datetime not null,
         EPOCH tinyint(4),
         REALITY tinyint(4),
         DETECTORMASK tinyint(4),
         SIMMASK tinyint(4),
         TASK integer,
         AGGREGATENO integer,
         CREATIONDATE datetime not null,
         INSERTDATE datetime not null,
         key TIMESTART (TIMESTART), 
         key TIMEEND (TIMEEND));

Taking each of these in turn:-

SEQNO

SEQNOs have to be unique. For suggested values see Selecting the SEQNO

TIMESTART

The inclusive start time for the validity.

TIMEEND

The exclusive time for the validity. The row is valid for time t where

  TIMESTART <= t < TIMEEND

EPOCH

The EPOCH number can run from 0 to 100 and provides the highest level of ambiguity resolution, Data with higher EPOCH numbers is always preferred over lower EPOCH numbers when resolving ambiguities. In general incrementing the EPOCH number should indicate some significant improvement has occured in the way the data was geneated.

REALITY

The REALITY number allows us to store data for altered realities, that may be useful in systematic studies. For example, if we want to see the effect of recalibrationg the detector under the assumption that MPPCs have aged somehow, these calibration constants can assigned a different REALITY. This feature is currently unused and all data is given a reality of 0.

DETECTORMASK

In MINOS we have near and far detectors and part of the context is the detector type. DETECTORMASK is a mask specifying which detectors the data applies to.

This field is probably unneeded for SK.

SIMMASK

SIMMASK is a mask specifying what types of data i.e. real or MC the data applies to.

TASK

Task provides a way to further select the type of data retrieved. For example a Detector Configuration data could have two tasks, one for raw calibration and another for refined calibration. The aim is that Task will allow a particular database table to be sub-divided according to the mode of use. The default value is zero.

AGGREGATENO

The calibration data for the entire detector is stored in a single table although in practise there may be some granularity in it. For example:-

The constants in some sub-detectors are more volatile than others so have to be re-calibrated more often.
The calibration procedure only operates on part of the detector at a time.

The solution is to write in smaller chunks (called aggregates) each with its own validity and then a query involves finding multiple SEQNOs, one for each aggregate. In our simple example the data isn't aggregated and the aggregate number is set to 0; presumably this will always be the case for SK.

CREATIONDATE

We have already seen that the CREATIONDATE serves both to record when the data was created and also to resolve ambiguity.

For T2K the new EPOCH column has been introduced for higher level ambiguity resolution; but creation date is still used if EPOCHs are identical.

INSERTDATE

This is the date when the data was entered into the local database which is independent of CREATIONDATE. It's useful for rolling back the local database to an earlier state.

key TIMESTART (TIMESTART), key TIMEEND (TIMEEND))

In MINOS we found a performance improvement by having keys to TIMESTART and TIMEEND as these are fundamental to the queries made by the interface.

In full our table is:-

+---------+---------------------+---------------------+-------+---------+--------------+---------+------+-------------+---------------------+---------------------+
| SEQNO   | TIMESTART           | TIMEEND             | EPOCH | REALITY | DETECTORMASK | SIMMASK | TASK | AGGREGATENO | CREATIONDATE        | INSERTDATE          |
+---------+---------------------+---------------------+-------+---------+--------------+---------+------+-------------+---------------------+---------------------+
|       1 | 2009-01-01 00:00:00 | 2009-02-00 00:00:00 |     0 |       0 |            1 |       1 |    0 |           0 | 2009-04-07 18:00:00 | 2009-04-07 18:00:00 |
|       2 | 2009-01-15 00:00:00 | 2009-02-15 00:00:00 |     0 |       0 |            1 |       1 |    0 |           0 | 2009-04-08 18:00:00 | 2009-04-07 18:00:00 |
+---------+---------------------+---------------------+-------+---------+--------------+---------+------+-------------+---------------------+---------------------+

Using the Database API to Access the Data

The API to get a set of DEMO_DB_TABLE rows for a given "context" (in our case this is an event date and time), is conceptually very simple: you pass the context in and the interface returns a set of rows. Each row is an object of class that inherits from the base class ND::TTableRow. In our case it is ND::TDemo_DB_Table. As you can see it is has a set of getters to provide access to all its data members. It could in principle be more sophisticated and provide calibration services as well or instead of these getters. It just depends how much of the table layout you want to expose.

Your first job is to get a context, which essentially supplies a timestamp. Then to get all ND::TDemo_DB_Table rows associated with the context you use the statement:-

    TSK_DBI_Context context;
    context.SetTimeStamp(current_unix_time);
    TResultSetHandle<TDemo_DB_Table> rs(context);

which constructs a TResultSetHandle that has access to a vector of TDemo_DB_Tables. The process starts by determining the table name by converting the class name to upper case and removing the prefix "ND::T" which in this case results in DEMO_DB_TABLE. Then it looks for the best SEQNO (or collection of SEQNOs in the case of aggregated data) in DEMO_DB_TABLEVLD and then reads in all rows of DEMO_DB_TABLE that have those SEQNOs and for each creates and fills a ND::TTableRow.

Being a constructor, it cannot "fail" but can return an empty set. You can use the methods:-

  Int_t GetNumRows() const;
  const ND::TDemo_DB_Table* GetRow(irow) const;

to loop over all rows, for example in the demo code you will see:-

  Int_t numRows(rs.GetNumRows());
 ...
  ND280Log("Applying query at " << UnixTimeToDateTime(current_unix_time) << " ... result set contains "
                                << numRows << " rows as follows:-");
  for (Int_t irow = 0; irow<numRows; ++irow) rs.GetRow(irow)->Print();

However, looping over all rows isn't normally what you want to do. Instead you want to directly access the row you want in order to calibrate a particular channel. ResultSetHandles supports the concept of a "Natural Index" which can be sparse (i.e. doesn't have to run from 0 to positive integer n) and for our case that index is obviously channel ID. The method is:-

  const ND::TDemo_DB_Table GetRowByIndex(index) const;

and again there is a (contrived) example in demo.cxx:-

  UInt_t required_id      = 1;

...

  const ND::TDemo_DB_Table* required_row = rs.GetRowByIndex(required_id);
  if ( required_row ) {
    std::cout << "  required row " << required_id << std::endl;
    required_row->Print();
  }
  else std::cout << "  cannot find required row" << required_id << std::endl;

Once you have access to an individual ND::TDemo_DB_Table you can use its getters to access the row's data.

The Ownership Model

The name "ResultSetHandle" should hint pretty clearly that you don't own the vector of TDemo_DB_Tables. Instead the interface caches them and just passes you a lightweight handle that you can freely pass by value, create and destroy. When it receives a new query it checks its cache to see if it already has the results and if so returns them. It uses reference counting to record how many clients are using the set and will not delete the set until that count drops to zero. Indeed, even then it will hang on to the set just in case it gets the same request again so efficiently supports the model:-

  loop over all events {
    request calibration set
    calibrate data
  }

were it to drop the set once its client once destroyed at the end of the event loop it would have to read it afresh at the start the next event.

Developing New Tables and Classes

New Table

Pick an uppercase table name using underscores to make it readable. For example

XXX_YYY

You need to create two tables:-

XXX_YYY
XXX_YYYVLD

See The Table Definitions for the constraints on XXX_YYY and the definition of XXX_YYYVLD.

For XXX_YYY the MySQl types supported and the ROOT/C++ types they map to are as follows:-

MySQL Type	ROOT/C++ Type	Comments
CHAR	Char or Bool_t
CHAR(n)	Char_t*	n < 4
TEXT	Char_t* or std::string	n >= 4
TINYINT	Short_t or UShort_t	8 bit capacity
SMALLINT	Short_t or UShort_t	16 bit capacity
INT	Int_t or UInt_t	32 bit capacity
FLOAT	Float_t
DOUBLE	Double_t
DATETIME	VldTimeStamp	Backend type

New Class

Study the demonstration files:-

TDemo_DB_Table.hxx  
TDemo_DB_Table.cxx

TDemo_DB_Table_LinkDef.h

Filling Tables

Although the MINOS DBI backend does have services to write tables these are not currently exposed. Writing data involves the following tasks:-

Selecting the SEQNO

The choice of SEQNO values is entirely arbitrary, they merely serve to map between entries in the pairs of tables. However MINOS have multiple databases where data for a given table is created and then distributed to the other databases. To do this they had to avoid using the same SEQNOs multiple times so allocated bands of SEQNOs to different databases according to the regime:-

            0 -     99999999  Test values local to any database and not exported.
 <ID>00000000 - <ID>99999999  Allocated to database <ID>

For ND280 and probably SK we never used multiple master databases and only ever used local SEQNO.

The script database_updater.py should take care of allocating a sequence number for you so that you don't need to worry about this.

Selecting the Validity

Conventionally TIMESTART is the time when the data that used to produce the calibration was taken. TIMEEND depends on the volatility of the data. DETECTORMASK and SIMMASK should be set to 1 and TASK and EPOCH should be set to 0 when a table is created. The CREATIONDATE should be set to the date when the calibration process was run.

The INSERTDATE should be set to the date when the data was entered into the database; this is handled automatically by the script database_updater.py

Assembling the Data

Assembling the data is beyond the scope of this document but once the data has been assembled one possibility would be to write, for each aggregate, an ASCII file with one line for each row in the database.

Writing the Data

Although it possible to write your own scripts to directly submit mysql queries, the python utility:-

database_updater.py

provides a convenient tool to load ASCII files into a database taking care of sequence number allocation. For details type:-

app/database_updater.py  --help

and for an example ASCII file see

 demo/demo_db_table.update

To actually run the database updating example do:

source setup_writer.[c]sh
app/database_updater.py apply_local_update demo/demo_db_table.update

Warning: this demonstration actually erases the old demo table before uploading new constants ('drop table if exists DEMO_DB_TABLE'...). Make sure that you don't use the drop table option when actually writing to your final database.

Database accounts

We typically have separate reader and writer MySQL accounts. The reader are used by analyzers that just need to retrieve data from the database; reader accounts do not have permission to modify the tables. Writer accounts can modify and drop tables; these accounts would be used for uploading new constants.

Database testing

The typical scenario for inserting new constants is that you'll want to test them. The recommended method of doing this is to use the test database before committing new constants to the main database. The sequence would be:

Set your SK_TSQL_URL variable to point towards the test database:

SK_TSQL_URL=mysql://neut12.triumf.ca/testskcalib

and set the other variables to use writer MySQL account.
Upload your new constants to the test database
Set your SK_TSQL_URL variable to use a cascade of databases: use the test database to get your new constants and the main database to get the rest of the constants SK_TSQL_URL="mysql://neut12.triumf.ca/testskcalib:mysql://neut12.triumf.ca/skcalib"
Run your test programs, confirm that new constants look right.
Set your SK_TSQL_URL variable to use the main database and then upload your new constants there.

Database rollback

Sometimes you may find that new constants don't work as expected. In that case you might want to return to a previously used set of constants. It is possible to select older constants based on their insertion data. For instance, suppose you had a validity table as follows:

+-------+---------------------+---------------------+---------------------+---------------------+
| SEQNO | TIMESTART           | TIMEEND             | CREATIONDATE        | INSERTDATE          |
+-------+---------------------+---------------------+---------------------+---------------------+
|     1 | 2009-01-01 00:00:00 | 2009-02-00 00:00:00 | 2009-04-07 18:00:00 | 2013-08-09 11:24:59 | 
|     2 | 2009-01-15 00:00:00 | 2009-02-15 00:00:00 | 2009-04-08 18:00:00 | 2013-08-09 11:24:59 | 
|     3 | 2009-01-15 00:00:00 | 2009-02-15 00:00:00 | 2009-04-08 18:00:00 | 2013-08-09 11:40:10 | 
+-------+---------------------+---------------------+---------------------+---------------------+

and that you decided that you wanted to use the set of constants that were inserted at 11:24:59, rather than 11:40:10. In that case you would use the ENV_DBI variable to only use information inserted BEFORE a certain time:

export ENV_DBI="Rollback:* = '2013-08-09 11:40:09'"

This would instruct the database code to only use constants that were inserted before 2013-08-09 11:40:09.

You can also specify that the rollback only operates on a certain table, by doing the following:

export ENV_DBI="Rollback:DEMO2_DB_TABLE = '2013-08-09 11:40:09'"

Code Authors

Code originally written by Nick West for MINOS

Code adapted to ND280 by Nick West and Alex Finch.

Code adapted to SK by Thomas Lindner