Imaging software, data management and storage WG 4th FMM

From BioImagingUKWiki

Jump to: navigation, search

Minutes of break out session 7 January Imaging Software, data management and storage

Attendees: Claire Stewart (chair), Paul Thomas, Andrew Vaughn, Alex Sossick, Yan Gu, Martyn Reynolds, David Gibson, Rolly Wiegand


Contents

What is the facility manager’s role in data management and storage?

To provide information and training to users, what’s available, how to use it.

Get users to think about analysis before acquisition and plan the entire workflow beforehand.

Train PIs on the importance of data storage regulations surrounding and options available in their institute, that not everything can and should be stored.

To make a management and storage solution available to users. Use of such a solution cannot be enforced.

Final responsibility for data remains with the end user.

Educate institute IT services about the specific problems of research data storage particularly in regards to file size and storage periods required.

Offer services optional charging levels:

• per unit storage

• per level of security

• Per level of backup

Cost and potential complexity of solutions suggests they can only be set up by core facilities which raises questions of funding and a change in attitude in institutes is required.

There is some consensus that the facility manager’s job no longer ends at the microscope but at the generation of the end result.


Why is data management/storage required?

BioImaging generates many 100s of GB of image data per week. This cannot and should not be stored exclusively using personal solutions.

Personal storage solutions with individual’s annotations make it very difficult to continue or review work when they leave

It is estimated that data remains ‘hot’, that is in regular use, for 6 months. After that data could be archived. Funding bodies are now requiring original data to be kept for 7 to 10 years.

As a gateway:

• to bring images together with a variety of processing softwares

• to facilitate work on images by cross discipline teams , computational scientists, mathematicians, biologists

• to facilitate computationally intensive processing by carrying it out in batches on server


What are the issues that must be solved by any storage/management system?

Widespread adoption within an institute since most researchers have developed their own methods over time.

Transfer rates on networks, single files easily reach many GB. Possible improvement with fibre networks – too expensive for whole institute but some are now upgrading high demand pathways e.g. route between microscope and storage.

Data duplication – data copied multiple times as it is worked on locally and stored.

Myriad of meta data and its different formats.

Must be searchable.

Assure quality for funding bodies.

Data security especially where data are clinical/patient related.

Conversion of file formats – from storage to native local format and vice versa, time consuming and potential loss of metadata.

Backup

Current lack of experience (will to gain that experience?) within many IT services means current solutions must be easy to setup and maintain with minimal/no involvement of IT specialists


What is the current state of OMERO?

Current release 4.2

Installation and upgrades improved

Better interfacing with other applications

Related: OME-TIFF is felt to be an appropriate standard image format for processing, it is accepted that it might not be the ideal for storage.

Suggested approach is to push back responsibility for using a storage system to users maybe charge for different levels of service. One facility charges £1000/TB/year for industry standard data security.


Open questions

To the funding bodies: how secure does the long term storage have to be?

Could services on a cloud be viable? Issues:

• sensitivity/protection

• Bandwidth whether for archiving or repeated access for processing

Long term file format compatibility: Open source is attractive because there is some chance of accessing and converting data even if the tool falls out of favour.

What are the requirements for image data storage that are part of a publication (raw data, processed images)?

Is data management a core service? Core facilities can make better use of resources including software and keep them up to date, be more efficient with funding therefore there is argument for keeping data storage as a core service.

What are the barriers to OME-TIFF as common image processing interchange format e.g. are there processing overheads in using TIFF based file?


Action points

Register interest in more common formats particularly industry wide support of OME-TIFF. Urge software and hardware manufacturers to reduce the number of proprietary file formats and offer a freely interchangeable format (e.g. OME-TIFF) as an alternative.

Support open source projects to improve format compatibility e.g. upload examples of the image files we have access to, to the BioFormats site.

Personal tools