Speaker
Description
"Data deluge" refers to the situation where the sheer volume of new data generated overwhelms the capacity of institutions to manage it and researchers to use it[1]. Data Deluge is becoming a common problem in industry and big science facilities like the synchrotron laboratory MAX IV and the Large Hadron Collider at CERN[2].
As a novel solution to this problem, a small cross-disciplinary collaboration of researchers has developed a machine learning-based data compression tool called "Baler". Developed as an open-source project[3] and delivered as an easy-to-use pip package[4], the machine learning-based technique of Baler allows researchers to derive lossy compression algorithms tailored to their data sets[5]. This compression method yields substantial data reduction and can compress scientific data to 1% of its original size.
With recent successes, Baler performed compression and decompression of data on field-programmable gate arrays (FPGAs). This "real-time" compression enables data to be compressed at high rates and transferred in greater amounts over small bandwidths, allowing Baler to extend its reach into the field of bandwidth compression.
This contribution will bring an overview of the Baler software tool and results from Particle Physics, X-ray ptychography, Computational Fluid Dynamics, and Telecommunication.
[1] https://www.hpe.com/us/en/what-is/data-deluge.html
[2] https://cerncourier.com/a/time-to-adapt-for-big-data/
[3] https://github.com/baler-collaboration/baler
[4] https://pypi.org/project/baler-compressor/
[5] https://arxiv.org/abs/2305.02283