Midas Talk

Back Midas Rome Roody Rootana

Midas DAQ System

Not logged in

Back | Find | Login | Help

30 Apr 2022, Giovanni Mazzitelli, Forum, S3 Object Storage

30 Apr 2022, Konstantin Olchanski, Forum, S3 Object Storage

01 May 2022, Giovanni Mazzitelli, Forum, S3 Object Storage

Message ID: 2393 Entry time: 01 May 2022 In reply to: 2388

Author:	Giovanni Mazzitelli
Topic:	Forum
Subject:	S3 Object Storage

> > We are storing raw MIDAS files to S3 Object Storage, but MIDAS file are not 
> > optimised for readout from such kind of storage. There is any work around on 
> > evolution of midas raw output or, beyond simulated posix fs,  to develop midas 
> > python library optimised to stream data from S3 (is not really clear to me if this 
> > is possible).
> 
> We have plans for adding S3 object storage support to lazylogger, but have not gotten 
> around to it yet.
> 
> We do not plan to add this in mlogger. mlogger works well for writing data to locally-
> attached storage (local ext4, XFS, ZFS) but always runs into problems with timeouts and 
> delays when writing to anything network-attached (even writing to NFS).
> 
> I envision that each midas raw data file (mid.gz or mid.lz4 or mid.bz2) will
> be stored as an S3 object and there will be some kind of directory object
> to map object ids to run and subrun numbers.
> 
> Choice of best file size is open, normally we use subruns to limit file size to 1-2 
> Gbytes. If cloud storage prefers some other object size, we can easily to up to 10 
> Gbytes and down to "a few megabytes" (ODB dumps will have to be turned off for this).
> 
> Other than that, in your view, what else is needed to optimize midas files for storage 
> in the Amazon S3 could?
> 
> P.S. For reading files from the cloud, code needs to be written and added to 
> midasio/midasio.cxx, for example, see the code that is already there for reading ssh-
> attached files and dcache/dccp-attached files. (CERN EOS files can be read directly 
> from POSIX mount point /eos).
> 
> K.O.

thanks, 
actually a I made a small work around with python boto3 library with file of any size (with 
the obviously limitation of opportunity and time to wait) eg:

key = 'TMP/run00060.mid.gz'

aws_session = creds.assumed_session("infncloud-iam")
s3 = aws_session.client('s3', endpoint_url="https://minio.cloud.infn.it/", 
                        config=boto3.session.Config(signature_version='s3v4'),verify=True)

s3_obj = s3.get_object(Bucket='cygno-data',Key=key)
buf = BytesIO(s3_obj["Body"]._raw_stream.data)

for event in MidasSream(gzip.GzipFile(fileobj=buf)):
    if event.header.is_midas_internal_event():
        print("Saw a special event")
        continue

    bank_names = ", ".join(b.name for b in event.banks.values())
    print("Event # %s of type ID %s contains banks %s" % (event.header.serial_number, 
event.header.event_id, bank_names))
    ....


where in MidasSream I just bypass the open, and the code work, but obviously in this way I 
need to have all the buffer in memory and it take time get all the buffer. I was interested to 
understand if some one have already develop the stream event by event (better in python but 
not mandatory). I'll look to the code you underline.
Thanks, G.

ELOG V3.1.4-2e1708b5