In defense of public scientific-data sharing: a NeuroMat op-ed

by Claudia Domingues Vargas and Fabio Kon*

The idea that knowledge should be available to anyone interested in appreciating it is a constant point in the history of mankind. It is present from Greek ancient philosophy to Renaissance science, from medieval troubadours to the great classical composers of the nineteenth century. At the same time, mechanisms to control knowledge, information protection and even cryptography have existed for hundreds of years. In the twentieth century there was a movement in the direction of restricting access to knowledge as a means of generating money or commercial benefits. Thus, only those who paid for the right to perform a musical piece received authorization to sit at the piano in a public concert and only those who paid to be granted access to scientific article had the right to read it. A significant part of scientific advances of the century was motivated by goals set by the army, whose practice of hiding discoveries from enemies is understandable. However, what we can observe in the long run is that, in general, when there is a higher level of sharing of ideas and openness of knowledge the advancement of science is faster and societies become more advanced, wealthier and more democratic.

In recent years, much of the international scientific community, with the support of governmental agencies, such as the National Science Foundation (NSF) from the United States, the European Commission and the São Paulo Research Foundation (FAPESP) in Brazil, have advocated for what has been called "open science." Aspects of this new model of sharing scientific information are threefold. Firstly, scientific results are to be spread out through “open-access” vehicles, so that any scientist and citizen could be granted an easy access to discoveries , regardless of their background or financial situation. Secondly, tools used in the scientific process should also be shared openly; since much of science today depends on computational tools, this indicates that they should be made available as “free software.” Lastly, research data must be shared as "open data:" not only should the raw and processed data be openly available, but also format descriptions and meanings of such data (what we may called metadata) should be widely distributed. When such information is collected from humans, special caution is required in the distributing, respecting the privacy and anonymity of those involved. One of the pillars of experimental science is its reproducibility. And science only becomes reproducible if the data and tools used in the experiments, simulations and analysis are also openly and freely provided.

The idea of open science has advanced at different velocities in different areas. In areas such as Computer Science, Genetics and Chemistry, for instance, these concepts have been very well received. In Neuroscience, sharing is not the usual practice. In general, both collection and storage of data are still done in an artisanal manner. There is a wide variability in the types of data that are collected. Databases in this area of knowledge should include information that ranges from the form and behavior of individual neurons, provided through measures of brain functioning, to behavioral measures. This large quantity and variety of information requires a type of database that is especially designed for this purpose. Furthermore, there is, even today, vast misinformation in the neuroscientific community about mechanisms of public data sharing.

The case of Neuroscience

The building, maintaining and supervisioning of public databases are seen as crucial by many members of the neuroscientific community as a means of moving forward more effectively in understanding the functioning and treatment of brain pathologies. The new paradigm of data sharing emerged in a more systematic way in the neuroscientific literature in the 1990s. At the same time the first major initiative of sharing data in this community was created, an initiative that relied on data collected from measurements of functional MRI, the International Consortium for Brain Mapping. This initiative was funded by NSF and National Institutes of Health (NIH), US governmental agencies that support scientific advances. The new model this initiative put forward accompanied the substantial increase in the capacity to generate experimental data in Neurosciences and new computational and public-data-sharing possibilities that emerged in a context of major development in information technology in previous decades.

Drawing by Odyr, at Le Monde Diplomatique Brasil, May issue.

Despite major advances in the design and introduction of public databases, data-sharing was not a consensus among neuroscientists. In 2000, the Journal of Cognitive Neuroscience determined that papers accepted for publication had to share their raw data in public databases. This decision stimulated similar stands from other publications of broader circulation. Nonetheless, under pressure from the community of neuroscientists, this proposal of public sharing was eventually repealed. Fortunately, these first attempted initiatives foreshadowed a new era, and since then several data sharing initiatives have been put in place, either as consortia, such as BrainNet (Brain Research and Integrative Neuroscience Network) or as public projects, such as NCF (International Neuroinformatics Coordinating Facility), CARMEN (Code analysis, Repository and modelling for E-neuroscience) and NEMO (Neural ElectroMagnetic Ontologies). An interesting instance of public sharing of clinical data is the database of patients with Parkinson coordinated by the Michael Fox Foundation for Parkinsons's Disease. This beautiful initiative illustrates the fact that there is increasing acknowledgment of the need for public databases to make progress in identifying early markers of brain pathologies.

In all the aforementioned examples, one needs to register in the website that hosts the database and sign a term of responsibility with respect to the privacy of individuals whose data are made available. Failing to meet this term could have legal consequences. It is also often requested that the origin of the data and the pieces in which they were published be referred to in any eventual new publication. In some consortia, the article must be submitted to the Scientific Committee that manages the database. In some cases, the researcher has the option to deposit the data in the database without sharing them publicly, thus being able to share them whenever it seems appropriate.

Stephen Koslow is among the staunchest proponents of public data sharing. A former director of the Division of Neuroscience at the National Institute of Mental Health (NiMH ) and one of the founders of the consortium BrainNet, Koslow published a manifesto in 2000 in Nature Neuroscience, a widely read and impactful journal within the neuroscientific community, in which he advocated the need for a mindset that could empower public data and tool sharing.

Koslow pinpointed the idea that raw data are very complex to be understood by other neuroscientists and data analysis performed by another person could take to results that differ from the original among the most common adverse reactions to the building of public databases. Other arguments against data sharing are, for instance, the resistance of making publicly available data that are often hard to collect, or the lack of legal mechanisms of protection in case of fraud and information misuse. Moreover, there is criticism regarding some of the current models of data sharing, in which, perhaps because of insufficient trusteeship, both the origin and quality of data that is made available remain unclear.

Opposed to these assumptions, Koslow argued that it was desirable and necessary that data should be correctly tagged and commented on, so that they could be understood and used by other researchers. Furthermore, he claimed that publishing results in the form of scientific papers assumes that the data is ready to be shared and complementary perspectives produced by new researchers could help the community to bettering its understanding of the phenomenon in question.

Thus, Koslow concluded that scientific benefits of data sharing outweighed the arguments against sharing and raised some practical strategies that could be adopted more widely in the scientific community. For example, the aforementioned guideline that some journals adopted to encourage data sharing could be spread out and have a positive impact on the research community.

Additional financial support for the building of databases within projects financed by public resources and the academic appreciation of resource and time investment allocated to the building of databases are also mechanisms of cultural change that the author proposed. These strategies might lead to an environment in which the data would be arranged so as to be shared during the acquisition process itself, not just at the end of the process. Critics could object that this would lead to more operational costs. To meet up this challenge, it is necessary to develop low cost technologies for data sharing, keeping and supervising.

The NeuroMat database

We currently participate in Brazil in the development of a database that will allow public access to neuroscientific data (physiological measures and functional assessments). This is a pioneering work that the Research, Innovation and Dissemination Center for Neuromathematics (NeuroMat) is developing –NeuroMat is co-ordinated by Prof. Antonio Galves and financed by the FAPESP. The project, which mainly involves researchers from the Federal University of Rio de Janeiro (UFRJ) and the University of São Paulo (USP), plans to build a public repository that could potentially foster advances in the understanding of brain functioning as well as in the treatment of neurological diseases.

Among the lines of research whose data will be hosted in the NeuroMat database is the project of cortical reorganization after injury and reconstruction of the brachial plexus, the set of nerves that connects the arm to the brain, which is a project currently underway at UFRJ's Institute of Neurology Deolindo Couto (INDC). In order to accommodate this and other projects with basic-clinical profile within the NeuroMat database, the research team has designed a prototype that will record and store patients' previous medical history, document injuries and record patients' clinical evolution through physical-therapy and longitudinal-physiological measurements. This detailed work of building and electronically computing measurements has been carried out by a multidisciplinary team formed by physicians, physical therapists and neuroscientists, plus a team of computer scientists from USP's Institute of Mathematics and Statistics (IME). The result of this NeuroMat initiative will be the creating of a common basis for the diagnosis, clinical evaluation and functional prognosis of patients with brachial-plexus injuries. The database model that NeuroMat adopted will also put on a "common ground" clinical assessments and all patients' electrophysiological data, thus enhancing flexibility in querying for and analyzing data. We are now working on developing a prototype for electronic storage, handling and sharing of data, and we hope that it will soon be available for public use. When it becomes public, this database may serve as a model for evaluation of other patients with similar injuries worldwide.

The creation of the NeuroMat database opens up an opportunity for scientists to have access not only to a universe of well-documented and labeled data, but also to the process that generated this shared working tool. Moreover, the public sharing of the analytical programs that generated the results that are made available creates a virtuous circle in that it allows the public to check their quality and accuracy. From this perspective, this process opens up a window of opportunity for rapid advancement of knowledge in this area. We hope to contribute to providing more open-data sharing in the Brazilian and international neuroscientific community so that we can all better our co-working conditions and that neuroscience can progress faster, having direct benefits to the public at large.

* Claudia Domingues Vargas is an associate professor at the Neurobiology Program of the Carlos Chagas Institute of Biophysics and coordinates the Laboratory of Neurosciences and Rehabilitation (LabNeR) at the Institute of Neurology Deolindo Couto (INDC), both hosted at UFRJ; and Fabio Kon is a full professor in Computer Science and vice-director of the Free/Libre Open Source Software (FLOSS) Competence Center at USP. Both are investigators within NeuroMat.

The original Portuguese version of this op-ed may be read here. Originally published at Le Monde Diplomatique Brasil.

This piece is part of NeuroMat's Newsletter #4. Read more here

Featuring this week:

Stay informed on our latest news!

Previous issues

Podcast A Matemática do Cérebro
Podcast A Matemática do Cérebro
NeuroMat Brachial Plexus Injury Initiative
Logo of the NeuroMat Brachial Plexus Injury Initiative
Neuroscience Experiments System
Logo of the Neuroscience Experiments System
NeuroMat Parkinson Network
Logo of the NeuroMat Parkinson Network
NeuroMat's scientific-dissemination blog
Logo of the NeuroMat's scientific-dissemination blog