Australian BioCommons Pathfinder Project

The Australian Bioinformatics Commons (BioCommons) is an ambitious new digital capability that will enhance Australian research in its ability to understand the molecular basis of life across all research domains, including environmental, agricultural and biomedical.

Digital technologies are proving transformational for the life sciences. This large-scale investment in digital infrastructure will ensure Australian life science research remains globally competitive, providing access to the tools, methods and training researchers require to respond to national challenges such as food security, environmental conservation and disease treatments.

Australian BioCommons Mission

It is the mission of the BioCommons to:

  • Sustain strategic leadership in the provision and​ use of bioinformatics and bioscience data infrastructures at a national scale
  • Actively support life science research communities with community scale digital infrastructure developed and maintained in concert with international peer infrastructures
  • Provide access to services that:

    • Provide sophisticated analysis capabilities, including software and hardware platforms that underpin world class science
    • Support digital asset stewardship and management, retention, integration and publication solutions as they evolve
    • Enable researchers to observe best-practice data standards, management, interoperability and publication approaches as they evolve
  • Provide enduring access to the digital techniques, data and tools, that are needed by world class environmental, agricultural and biomedical research.
  • Provide training and support solutions that enable the rapid and broad based adoption of the above

BioCommons Principles

  • A national focus on capabilities and communities
  • Partner internationally: participate in and contribute to larger critical mass efforts where possible; reuse and improve rather than build anew
  • Build a software and expertise capability that will reduce duplication of infrastructure management in Australia and allow efforts to be re-focussed on methods development and dissemination
  • Promote the development of, and build on, high throughput cloud infrastructure that is interoperable with international (initially US and European) equivalents, using established, well supported software platforms
  • Streamline the exchange of tools, workflows, data and training and expertise both nationally and internationally

BioCommons Pathfinder Projects

Following an extensive national consultation exercise in 2017-2018, funding has been committed from Bioplatforms Australia, the Australian Research Data Commons (ARDC) and Australia’s Academic and Research Network (AARNet), for an initial “Pathfinder phase”, to be conducted in 2019.

The high level aims of the BioCommons Pathfinder phase is to continue to actively engage the Australian bioscience community to deliver:

  • An operating infrastructure providing a core set of bioinformatics services
  • A set of research activities and associated communities providing exemplars for others to follow
  • A consortium of participants providing guidance and implementation support
  • A strategic plan for the BioCommons
  • A five year operational plan for its delivery (through to 2023)

To achieve these goals, a small number of projects are envisaged to be undertaken in the Pathfinder Phase (2019 calendar year).

What? Why? Who for? Potential Partners (Expression of Interest from other partners are welcome)
Leadership, National and international engagement

The Australian Government has made long term commitments to a forward-looking Research Infrastructure Roadmap, in a set of priority areas that include Complex Biology and Digital Data and eResearch Platforms.

At the same time, the importance and scale of ‘omics data holdings, and their integration with other data types, has become mission critical to life science research.

The leadership activity is driven by the fact that a groundbreaking national investment in bioinformatics infrastructure which is needed can now be made.

For the bioscience research community through the strategic enhancement of the infrastructure and services available to that community.

Bioscience focussed institutions and research centers with a substantive molecular science activity.

National eResearch investments including AAF, AARNet, ARDC, NCI and Pawsey.

Regional and institutional bioinformatics groups and regional and institutional eResearch support groups.

Establishment of a Cyber- infrastructure that is enduring, and appropriate for data-driven biology research

Digital research in life sciences is generally supported by group and/or institute-level digital infrastructure, with consequent inefficiencies and duplication of effort, plus uncertainties around interoperability and longevity.

A national approach will help harmonise resources and reduce duplication of infrastructure management in Australia, allowing domain efforts to be re-focussed on methods development and dissemination.

Global cyber-infrastructure developments are increasingly focussed on cloud infrastructure, as exemplified through the NIH Data Commons, the ELIXIR Compute and Data strategies, and the European Open Science Cloud. This program of work will focus on developing highly dependable, sustainable cloud infrastructure and expertise, developed and maintained in concert with international peer infrastructures.

Specific topics within this program of work include: Authentication and Authorisation Infrastructure (AAI), Cloud Apps Engineering, Big Data Hybrid Cloud.

Any Australian researchers wishing to undertake various bioinformatics analyses

NCI

Pawsey

AARNet

ARDC

AAF

Enabling access to a robust and professionally managed BYO data analysis system – extending previous efforts which have established Galaxy Australia

In the age of cloud computing, researchers should be able to simply log-into a web-based system, upload/connect their data and undertake their bioinformatics analyses.

Galaxy is one example of a sophisticated bioinformatics workbench that contains 1000s of tools, and previous Bioplatforms and ARDC supported efforts have established a national hosted instance (Galaxy Australia) which can be used by any Australian researcher.

This project will extend this service so that (a) it is appropriately resourced and managed so that evermore computationally demanding jobs can be run on the service, in a time frame practical for daily use, and (b) analysis workflows for new data types (e.g. metabolomics, phylogenetics) are incorporated into the service.

Any Australian researchers wishing to undertake various bioinformatics analyses (e.g. quality control, genome assembly, variant detection, RNAseq, metagenomics, phylogenetics, metabolomics).

Queensland Cyber Infrastructure Foundation (QCIF)

University of Queensland Research Computing Centre (RCC)

Melbourne Bioinformatics

AARNet

ANU Bioinformatics Consultancy

Metabolomics Australia

Providing access to data collaboration systems for communities of practice – Trialling Cyverse as a collaboration system for groups of researchers in Australia.

In research,where projects are increasingly data-driven and collaborative across numbers of sites, research consortia and communities require on-line environments that allow for effective collaboration. This includes data storage and management (organising, describing, sharing etc) as well as access to appropriately resourced tools and pipelines for data analysis.

Cyverse is a well-established and sophisticated data collaboration platform that has been developed over many years in the USA by the National Science Foundation, and this project aims to determine its suitability as a data collaboration platform that could be implemented in Australia to support a wide range of researchers.

If determined to be a suitable platform to fulfil user requirements, Cyverse (or specific components of Cyverse) will either be deployed in Australia, or access negotiated to the US-based system, for an extended trial period for several groups.

During the pathfinder phase, we will work with established consortia (e.g. the Bioplatforms Australia coordinated Oz Mammals Genomics and Genomics for Australian Plants framework data initiatives) and others to determine the suitability of establishing a Cyverse deployment (part or whole) in Australia, and if appropriate, undertake use trials.

Cyverse USA (Tucson, Arizona)

Local implementation partners TBD

AARNet

Delivering impact to Australian Researchers by participating in a Global Data CommonsA human health exemplar identified, aligned with a major NIH Data Commons activity.

The NIH Data Commons is a transformative US-funded effort to accelerate biomedical discovery by providing a cloud-based platform where investigators can store, share, access, and compute on digital objects relating to human health research including data, software, and workflows.

Disease does not respect national boundaries and biomedical research (especially on rare diseases) often benefits from access to larger cohorts of participants, which need to be recruited from different countries/jurisdictions.

Cloud based infrastructure (e.g. the gen3 platform) being developed through the NIH Data Commons represents a technical solution to help researchers everywhere to actively undertake global research collaborations.

This project will identify a human health research exemplar that is (a) being undertaken by an established Australian research consortium, (b) requires collaborative data sharing with US-based researchers and (c) is aligned with a major NIH Data Commons activity. We will implement a solution whereby the Australian-based researchers can fully participate (store, share, access, and compute) in collaborative research through a NIH Data Commons platform.

During the pathfinder phase, we will understand how we can connect the compute, analysis and data requirements of Australian-based research programs programs such as Zero Childhood Cancer with the US-based (and NIH Data Commons supported) Gabriella Miller Kids First Pediatric Research Program (Kids First), through exploring the Kids First-developed large-scale data resource that provides access to tools and data for researchers to uncover new insights into the biology of childhood cancer.

Local implementation partners TBD

AARNet

Making Sensitive Data at rest more FAIREstablishing a Genome Archive (Local EGA or similar) that is appropriate for storage of and managed access to human-derived data and metadata.

Human genomic studies are required to be undertaken within strict ethical frameworks, and the data itself is always considered to be sensitive as it contains readily identifying (names/dates etc) and not so readily identifying (demographic, genome sequences) information, along with potentially sensitive lifestyle/disease information which is required for providing context.

In order to permanently archive and store of all types of personally identifiable genetic and phenotypic data, resources termed ‘Genome Archives’ have been developed, which offer scalable infrastructure for safe, efficient, ethical, and legal storage, analysis and sharing of sensitive personal data for biomedical research.

The aim of this project is to determine the suitability of a pan-European developed Genome Archive solution (Local EGA) for storage of sensitive human-derived data from various national human sequencing efforts in Australia, and if appropriate, implement such a system for pilot testing for several research projects.

In the pathfinder phase, we will partner with several groups with potential interest in Local EGA implementation National Centre for Indigenous Genomics, Garvan Institute for Medical Research / Kinghorn Centre for Clinical Genomics, Victorian Comprehensive Cancer Centre (VCCC))

NCI

AARNet

ELIXIR Human Data Community

Other local implementation partners TBD

Communities and infrastructure services identified for common omic-based challenges: e.g. Genome annotation; Multi-omics integration; Comparative Genomics; Environmental Genomics

Recent technological advances have brought high-throughput molecular assays within the reach of researchers dispersed across the country and across research domains. Despite the diversity of their research questions, these researchers often have similar software, workflow, compute and training needs, yet many encounter roadblocks when trying to meet their own infrastructure needs locally, and this is also unnecessary duplication of effort.

The aim of this exercise is to conduct inclusive community consultation about tools and methods used by a very broad base of researchers, and to identify common services and infrastructure that can be built into the future to support biomolecular researchers across Australia.

During the pathfinder phase, we will work with established consortia (e.g. those established through Bioplatforms Australia coordinated framework data initiatives) and others to (a) identify communities of practice, (b) understand their requirements, and (c) identify services that can be used to help address common informatics challenges faced.

EMBL Australia Bioinformatics Resource

Melbourne Bioinformatics

Queensland Cyber Infrastructure Foundation (QCIF)

Workforce Transition – Upskilling biologists in the use of bioinformatics tools and best practices

In the era of “big” biological data, developing core informatics competencies of bioscience researchers is considered to be one of most significant challenges facing this research community globally.

Offering training to help increase bioinformatics competency has also been identified in the past by the Australian biosciences research community as a high-priority and high-impact output of  programs such as the proposed ABC.

During the pathfinder project we will specifically extend bioinformatics training and education activities that have previously been conducted under the auspices of the EMBL Australia Bioinformatics Resource, and potentially Bioplatforms Australia.

Australian life science and bioinformatics researchers and support staff.

Some events (e.g. webinars) will be held on-line to allow attendance from anywhere, and some events (e.g. hands-on workshops) will be held in specific locations across a variety of states and territories. All events will be recorded and all material made available for self-paced study.

EMBL Australia Bioinformatics Resource

Melbourne Bioinformatics (University of Melbourne)

Monash University

Sydney Informatics Hub (University of Sydney)

Systems Biology Initiative (UNSW)

QCIF – University of Southern Queensland

QCIF – University of Queensland

James Cook University

University of Adelaide

University of Tasmania

University of Western Australia

Other partners TBD