Software tools & Data Management
From BioImagingUKWiki
Contents |
Members
Daniel Zicha (London Res. Inst., CRUK), Jason Swedlow (Univ. Dundee), Paul Thomas (Univ. East Anglia),
Strategies for delivery of data management, archiving, and analysis tools
Introduction
After 25 years of development most imaging methodologies are well established and in common use across all biological and biomedical research laboratories. As noted in the accompanying BioImagingUK working group documents, the need for sophisticated imaging tools continues to grow and many new developments are coming online rapidly. This growth has driven the need for computational tools for processing, analysing and managing data. Indeed in our own surveys of BioImagingUK participants, we have found unanimous need for new and more advance imaging tools at all sites in the UK. Thus there is clear evidence that despite significant development and the availability of sophisticated imaging tools both in the commercial and open domain; there is significant un-met need that limits discovery and the development of biological insight. It is these limitations that we address in this working document.
Availability of tools and resources for image analysis and processing
Most research laboratories have access to commercial imaging platforms and almost all of these include very sophisticated image analysis software made available at time of purchase or through upgrades. For the purposes of this document, we classify commercial imaging software into three categories:
- Software that runs commercial image data acquisition platforms
- Closed, commercially licensed tools for data processing, analysis, and visualization
- Commercial tools based on open-source frameworks or with open programming interfaces
Commercial software is certainly quite powerful and responsible for much of the productivity and discovery that bioimaging achieves. However the flexibility required in modern biomedical research and the need to rapidly develop prototype tools has led to the emergence of open-source or open application programming interface (API) packages that provide a foundation for adaptation and customization (see below). ImageJ and ITK are two dominant examples of many applications suites which are available and commonly used in nearly all research laboratories. The open and pluggable nature of many of these applications makes them ideal for use in scientific environments where custom applications are almost always required.
There is no general requirement that all software tools be completely open. For example Matlab is a commercial scripting environment that provides an open, standardized interface where users can produce their own processing scripts based on proprietary architecture. Matlab has seen heavy use in cell and developmental biology simply based on its open programming interface. ImageJ is a very successful open image software package. ImageJ’s core code base is controlled by one developer (Wayne Rasband, NIH), but its architecture easily allows ‘plug-ins’ to extend its functionality. Recently a new ImageJ development team has emerged that is beginning the transition of the ImageJ codebase to well-established open-source frameworks and the Fiji project now releases complete, supported versions of ImageJ and a large suite of plug-ins The key point is that as powerful as existing commercial tools are, providing open interfaces to their underlying functionality is an attractive and powerful way to enable scientists to use existing tools but also to add on their own. For this reason BioImagingUK envisions an increasing movement towards openness in all imaging software applications. The resulting increased flexibility contributes to the growing use of quantitative tools in biological research and ultimately to scientific discovery.
Developing and using open-source software for bioimaging
In recent years, the open-source software movement, based on architectures like Linux and Java, has matured considerably. The requirements and methodologies of open-source communities are now well established and standardized across the community. Open-source projects distribute their original source code, usually alongside built version of their code, so that users may adapt or extend the software to their own needs. Open-source software is distributed with a variety of licenses that define what users may or may not do with the software, as well as any software they generate that uses the open-source package. The terms of these licenses vary, from providing no restrictions whatsoever, to demanding that any distribution must retain the open-source nature of the initial code (for more details, see Open Source Initiative). Open-development projects distribute their software using open-source licensing, but also publish their plans and roadmaps, and include community feedback in their development process and priorities.
In the UK there are a number of very successful open-source/open-development projects that underpin scientific analysis and discovery. CCP4 is a standardized application used for analysis of crystallographic data. The software that has been built around the wide variety of genomics projects all embrace and use the open model, and depend on a defined, standardized interface for access. The Open Microscopy Environment(OME) develops tools for data management and access for biomedical imaging.
Keeping software alive
While making software openly available is often attractive, it comes with a significant burden of software support and maintenance. Software Support requires interaction with the community, help with installation, bug monitoring and in general providing help whenever asked. Software Maintenance is a different problem. Where existing software must be modified and adapted as new operating systems come online, new functionality is required or underlying software dependencies change then ongoing maintenance ensures that the initial investment in the tool will be continued over time. Maintenance is perhaps one of the most underestimated yet most important efforts for any software project. Assuming resources for development, support and maintenance are provided, open-source software projects can move rapidly, develop an active user community and deliver value to its users. Patterns for working on open software projects are well developed from projects like Apache, Java and the GNU project. In fact there are even open-source project management systems (e.g., Trac and Agilo), code repository systems (e.g., SVN) and continuous integrations systems (e.g., Hudson). However even with the availability of these tools, it is worth remembering that any software project again requires resources to run such facilities for the benefit of the community.
Recently one of us has published a review highlighting the flexibility and adaptability of open-source software. Flexibility is an important attribute as scientific discovery can rarely be supported by canned customizable tools. Nonetheless there is a role for certainly routine applications and quite advanced analyses. However as new technologies come online, open-source software projects are able to adapt more quickly and provide initial tools to the community as a certain technology matures. For this reason we believe that open-source software projects provide a strategic and very important part of the software ecology for scientific discovery by imaging. In general, our experience is that open-source software functions best as infrastructure which can be easily extended. For example, ImageJ provides an easily pluggable interface which can be adapted to users needs. OME provides infrastructure tools in the form of file format readers and data management tools and is easily extended by users or any member of the community. We know of no cases where open-source software is the single best solution for every problem, but it certainly provides a powerful alternative for a significant component of the scientific community, and thus we strongly believe that it must be supported alongside the commercial imaging vendor community.
Establishing a global repository for facilitating access to bioimaging data
Data sharing is a critical aspect of research that helps to prevent duplication of effort, encourages ethical behaviour, and helps to circulate scientific knowledge. Both publishers and funding agencies are requiring data archiving and access. Indeed, in 2009 the Committee on Science, Engineering, and Public Policy (COSEPUP) stated that “Research data, methods and other information integral to publicly reported results should be publicly accessible”. BioImagingUK fully supports these proposals; however, we realize the challenge this represents given the multidimensional nature of bioimaging data. Nevertheless, we believe the need for a global repository for imaging data, akin to GenBank and the PDB, far outweighs the difficulties inherent in establishing such a database.
As argued by one of us recently (Linkert et al., 2010), the acceptance, and use, of a standard image format will provide the foundation for the creation of an open image database that will serve the best interests of the biomedical research community. Thus, the adoption of a standardized image format and the construction of an bioimaging repository are closely intertwined and interdependent processes. Linkert et al. (2010) have proposed a number of principles (see Box 1 in Linkert et al) to improve data gathering and sharing in the biomedical imaging community. BioImagingUK is in full agreement with these recommendations and, in response, has developed a series of goals for itself which are highlighted in the attached policy document. Also, included in this document are some actions that BioImagingUK intends to carry out in order to achieve these goals.
Aims & Actions of BioImagingUK for the Improvement of Software Design, Distribution & Access
Aims
- A standardized file format should be adopted by the scientific community to facilitate data analysis, archiving and sharing. This standardized format should be non-proprietary and contain sufficient metadata to allow subsequent reproduction of the data in any laboratory equipped with the identical, or equivalent, microscope. The file format supported by the majority of the UK imaging community is the Open Microscopy Environment tagged image file format (OME-TIFF).
- BioImagingUK recognizes the critical importance that commercial software packages have in delivering new imaging tools and analysis and discovery. BioImagingUK recommends fully supports use of these tools, and the use of research council and charity funding for their purchase, but also believes that these commercial packages must support open file formats initially and in the longer term open software interfaces and plug-in architectures such as ImageJ and OMERO.
- In order to facilitate the data sharing, BioImagingUK recommends that funding be provided to develop, run and maintain a global repository for bioimaging data in a manner similar to the resources held by GenBank and the PDB.
Actions
- BioImagingUK will lobby the UK funding organisations to include requirements that image data be ultimately stored in open standardized file formats like OME-TIFF.
- BioImagingUK members will require their commercial software providers to include support for open file formats like OME-TIFF in their software and make their purchases contingent upon this requirement being satisfied.
- BioImagingUK will lobby academic publishers to employ the OME-TIFF format as the “default” format for publication.
- As a first step to establishing a global image database, BioImagingUK asks that the Euro-BioImaging project provide seed money to create a European Image Repository. Furthermore, BioImagingUK will lobby UK funding bodies to contribute financial support for this Europe-wide resource.
- BioImagingUK will work to define the software and data needs of the UK bioimaging community, and work with academic software developers to create new open-source tools that support existing open standards and programming interfaces.
Notes from January 2010 Facilities Managers Mtg, London
A meeting of UK imaging facilities managers was held in January 2010 at Imperial College. The following are notes from the Imaging Software breakout session.
Questions:
- Would UK biological research be improved by better integration of image processing and analysis software?
- If so, would a single, standardized file format be advantageous?
- If the answer to 2 is yes, what should this format be? OME-TIFF?
- Seamless analysis of saved data requires development of tools that can be used directly on the saved data without “exporting”. If a standardized file format is chosen, it must be readable by the majority of image analysis programmes. Clearly, different questions will arise depending on the chosen format, but if the OME-TIFF format is chosen, it would be essential to provide the majority of tools that users need embedded in the OME. How can the community increase the development of new software tools for the OME? Alternatively, does it make more sense to promote further developments in ImageJ which is open-source and can already read OME-TIFF? Or, can we carry on without a standardized format and rely upon tools such as Bio-Formats to read the various image formats, either into OMERO or ImageJ?
- Do UK biological researchers need a repository for images that is freely and publicly available to the global community (akin to the Protein Data Bank)?
- If so, should this be a single, central repository, or distributed over several (regional?) sites?
- Should the database system be based on the OMERO server? Or, are there other alternative models?
- How can the archived data be easily accessed by the community? Do we need new applications, or are there tools already available; e.g., the JCB DataViewer?
- At a local level, should individual institutes create an archiving system for their imaging data; and if so, what platform should be used?
Polling the Community:
The answers to all these questions must be agreed to by the UK bioimaging community. As a first step to this, we discussed these questions at the 2010 Imaging Facility Managers’ meeting. A summary of the conclusions to the questions above is shown below.
Answers & solutions:
- The overwhelming verdict of Imaging Facility managers from across the UK is that research productivity would be significantly improved by improving cross-platform compatibility, thus facilitating the processing and analysis of images.
- Furthermore, the Facility managers believed that better integration could only come about by adopting a single, standardized file format.
- There are a few candidates for such a file format, but it is is felt by the majority of imaging scientists in the UK that the most mature, and best characterized, candidate is OME-TIFF.
- The idea of a single file format is tentatively supported by companies developing image-analysis software, as these companies would have to spend less time and money developing algorithms for importing numerous different file formats. Nevertheless, UK imagers think that a stumbling block would be the acceptance of this idea by image-acquisition companies. To address this problem Imagers must demand that their microscope providers include the ability to save (or at least export) to OME-TIFF. With regards to developing software tools to work on OME-TIFF files, funds could be made available for grants specifically-targeted at image-analysis software development (either for the OME, ImageJ, or alternatives). Also, sites could be identified (e.g., Dundee, Imperial, Liverpool, UEA, etc.) that are already developing tools and, at these sites, provide core programming positions. (Also, need to create better integration of these various sites?)
- Again the UK Facility managers generally supported the development of an on-line, centralized database for published images.
- Funds could be provided to establish a central archive at a single location; perhaps, based at the Imaging Solution Centre (Harwell), or at the proposed UK Centre for Medical Research & Innovation (St Pancras)? Alternatively, imaging facilities around the UK could be selected to establish smaller, networked databases; funds could be provided to purchase servers of adequate capacity and to provide network accessibility (with matching funds from institutes?). Also, core positions for database/server management could be funded (either at the central archive, or “roving” experts in the case of a distributed archive).
- There is less consensus across the imaging community about what platform should be used for local archiving (see below); nevertheless, it would seem that for a central store the most mature programme should be adopted. Considering that the JCB DataViewer is based upon the technology employed in OMERO, it would seem that OMERO would provide the best solution.
- To provide access, a solution based upon the JCB DataViewer, but increasing its functionality, would seem to be the best option.
- An informal survey of the Facility managers revealed that few had implemented an archiving system for their imaging data. There was much discussion of the use of both OMERO and commercial alternatives (e.g., Imagik). With respect to commercial database systems, it was felt that there could be problems of data retrieval if the company were to collapse, or if users were no longer able to afford the service. It was thought that this was less of a problem with open-source software such as OMERO; nevertheless, there was not a lot of enthusiasm demonstrated for the adoption of OMERO. In the past, some managers had tried OMERO, but had failed to get it to work. Managers who were present users of OMERO argued that it was much improved, and enabled better organisation of large amounts of data and simpler access to archived images. Most managers thought that uptake of OMERO might be greater if it was simpler to install and had more image analysis tools; thus, funding initiatives such as those described in 4 and 6 (above) may increase uptake of OMERO by solving some of these problems. However, perhaps the biggest incentives to uptake of any database system will come if the centralised database (see 5, above) is implemented, and if funders begin to seriously apply their rules regarding data archiving and access.