Tag Archives: research data

Blockchain insights with Data61 researcher Dr Mark Staples

Dr Mark Staples is a blockchain researcher at Data61, which is part of Australia’s federal science organisation, CSIRO. Being both a scientist and a blockchain expert, he has rare insights into how blockchain can propel research.

On my last trip to Brisbane I caught up with Mark for a drink at the Plough Inn and asked him to answer some of science’s most burning blockchain questions.

In this interview, we take a look at the challenges scientists face in managing their data, how blockchain can help, and where we’re at when it comes to issues of confidentiality, scalability, cybersecurity and policy.


First of all Mark, could you tell us a bit about your background?

My background is in computer science, cognitive science, and then eventually I got into formal methods and software engineering. But these days, I’m mostly looking at a lot of work around blockchain. I do blockchain research at Data61 — mainly around software architectures for blockchain-based applications.

And can you tell us what’s happening in Australia on the blockchain front?

Australia is doing quite a lot of work around blockchain. The Commonwealth Bank has had some world firsts around the use of blockchain for the trade and also for bond issuance. Companies like AgriDigital have also had some world firsts for use of blockchain to track the agricultural supply chain. Australia’s leading the standardisation process — the international standardisation work on blockchain and distributed ledger technology. So Australia is quite present in blockchain internationally and leading in some areas.

What areas of blockchain research are Data61 focused on?

The area where we’ve been leading in research has been using blockchain as a way of executing business processes. So, taking business process models and turning them into smart contracts to execute multi-party business processes on blockchain.

We’ve also been thinking about ways to take legal logics to represent contracts or regulation, and turning those into smart contracts. We do some work in the Internet of Things for blockchain as well. And supply chain integrity.

So, there’s a variety of different pieces of research, and then we work with companies; we develop technology, and we participate in the international standardisation of blockchain.

Being a scientist yourself, how do you see blockchain propelling science?

The key thing that blockchain supports is data sharing and data integrity. Both of those are critical for science.

Normal blockchains are not so good for confidentiality, but they’re great for publishing stuff; they’re great for publicity. One of the barriers for the adoption of blockchain in enterprises includes challenges around managing commercial confidentiality. But for a lot of science publications — both low-risk data and papers — they want to be public and blockchain is good for that.

Not only will it be public, but you also get this trail of what’s happened to the data. You get some sort of evidence about the integrity or authenticity of the records that are being created as well, by relying on the cryptographic techniques inherent in blockchain.

So I think that’s the key potential for blockchain for science — better publishing of scientific datasets and publications with better support for integrity.

Is data management a big issue?

Yes, we’re not very good yet at managing data integrity or sharing datasets or getting recognition or citation for datasets that we’ve collected or used. Not only from a professional point of view — scientific impact analysis and the like — but also in understanding data integrity from a scientific validity standpoint.

We need to be able to answer questions like: What operations have been done to your dataset before you start doing your own operations on it? How was the data that you’re working with collected? Has it been cleaned or not? All those questions are important when you’re doing an analysis of the data.

What issues have you observed in your time as a scientist in terms of how the scientific data is managed and applied?

There are a lot of data description challenges. Have you described what are all the important characteristics of a dataset? How do you describe those? There’s a variety of standards for metadata for datasets.

How do you describe the history of the provenance for data? What steps were taken in the collection or the analysis of a dataset and derived datasets? All of those are not really completely solved problems. We don’t have standard solutions for a lot of them, so that’s one challenge.

Do you think blockchain’s support for data integrity might actually help reinstate or build better trust in scientific evidence?

Yes, potentially, it could create more evidence for the trustworthiness of data and more evidence that data has been analysed if we used it in the right way.

And what particular difficulties are there in actually getting these systems adopted by universities or research institutes?

Blockchain is good if you want to make datasets public. But there are certainly a lot of datasets in science that are not public for various reasons — especially in the medical research area. So, they present much more of a challenge; you can’t necessarily just publish those datasets through a blockchain.

You might still be able to use a blockchain and other kinds of digital fingerprinting techniques to provide evidence about the integrity of data that you’re using without compromising official privacy, but it gets complicated to manage that kind of thing. So that would be one of the main challenges.

Could you put metadata or de-identified data on the blockchain as a solution to the confidentiality problem?

If you just have high-level metadata on the blockchain that can be be okay. If you have aggregate statistics in there, then you need to start worrying about the version you’re releasing as well. But a very high-level purely descriptive dataset is less likely to be a problem.

De-identified datasets are difficult. It’s a real challenge to effectively de-identify data. We’ve seen so-called de-identified data sets that have been susceptible to re-identification attacks, so that’s a difficult problem. We have a couple of teams in Data61 looking at private data release and private data analytics.

Are there any particular challenges for scientists on an individual level, when it comes using blockchain-based systems?

One practical challenge is that all the public blockchains and most of the private blockchains rely on public-private cryptography, which means public-private key management. In order to create a transaction to report some data on a public blockchain, you need to be managing a private key to be able to digitally sign the data that you’re transacting with.

There are various bits of software that can help to manage that. There’s wallet software, for example. But still, it’s a new thing scientists will need to do to manage keys and to have good cybersecurity in key management. Because these blockchains allow people to enact things themselves on the blockchain — to directly interact with the blockchain.

Blockchain creates a responsibility for people to be able to manage their cryptographic identities with integrity as well. The integrity of your data can come down to how good you are at cybersecurity and how you protect yourself against cyber-attacks. It requires effective cryptographic key management by people who are not used to doing it. So, that becomes another barrier to using blockchain.

Is scalability still a problem?

That’s an inherent problem with blockchain. Blockchain is meant to be a distributed database where you might have thousands of copies of data all around the world. Big data in terms of big volume is just inherently hard to move over the network. So it’s inherently hard to replicate around the blockchain nodes all around the world.

But I think we already know the solution from a big data point of view. The blockchain-based system is never implemented just with blockchain alone. It’s always implemented with a variety of other auxiliary systems — whether that’s just key management or maybe also user interfaces or off-chain databases for private data or big data. So, I think that’s the solution; just kick the big data off-chain.

Apart from big data, there are other scalability challenges for blockchain in terms of transaction latency. These things are being worked on, so I don’t see them being a huge problem in the medium term.

Are there other ways that blockchain will need to develop before it can support large-scale research?

Another big challenge is governance of blockchain-based systems. So, normal IT governance assumes there’s a single source of authority that’s in control of an IT system, and so the adoption of that control and the evolution of that system can be controlled from the top through that source of authority.

But with many blockchain-based systems there’s no single source of authority. It might be a collective that’s operating it, or the collective might be random groups in the public.

So, how to control the evolution and management of the blockchain-based system can be a difficult problem. Some blockchains are implementing governance features directly on the blockchain, but it’s not clear yet what the best way to go is, and it’s still an active area of innovation.

Is there scope for greater funnelling of clinical data and consumer data back into research?

I think the biggest challenges there are policy challenges, not so much technical challenges.

What does a good security policy model involve for clinical information sharing? Who should be allowed to see what data, for what purpose and when, under what consent model? Even that is not very clear at a policy level at the moment.

So in terms of research ethics applications, there’s a huge variety of different consent models that are supported by specific ethics approvals. You can implement technical controls for any of those, but knowing what you should be implementing is, I think, the hardest part of the challenge. There’s a lot of variability, especially for clinical information.

Have you seen movement towards giving the individual control over their data over the long term?

There are some interesting things happening in that space. Are you familiar with the Consumer Data Right that the government recently announced? The first incarnation of it is something called open banking, where the government creates a right for consumers to direct their bank to share information about their personal accounts with a third-party. The individual has to give consent to the third-party to use their data for a particular purpose, but then they give an authorisation and direction to the bank to authorise the third-party to access their data.

That’s an interesting model for giving consumers more right to direct where their data goes on a case by case basis. It’s quite different to most of the other models I’ve seen for giving consent to share data. Normally, an organisation holds data about a consumer, and the consumer is trying to keep up with all the various consents to access it—derived and delegated consent, emergency accesses and whatever other accesses are made to their information.

But when it comes to clinical information, I think policy is complicated by a lot of different interests. I don’t know if we have a good answer to that.

It’ll be interesting to see how it all unfolds Mark. Stellar insights. Thanks for your time!


To find out more about blockchain research at Data61 and to read their reports on how can be applied to government and industry, click here.

– Elise Roberts

This article was originally published on Frankl Open Science via Medium. Frankl works on solving issues around data sharing and data integrity in science, using blockchain and other technologies.

Data driven communities

Featured image above: the AURIN Map implements a geospatial map publicly available online. Credit: Dr Serryn Eagelson, AURIN

Ildefons Cerdà coined the term ‘urbanisation’ during his Eixample (‘expansion’) plan for Barcelona, which almost quadrupled the size of the city in the mid-19th century.

Cerdà’s revolutionary scientific approach calculated the air and light inhabitants needed, occupations of the population and the services they might need. His legacy remains, with Barcelona’s characteristic long wide avenues arranged in a grid pattern around octagonal blocks offering the inhabitants a city in which they can live a longer and healthier life.

Since Cerdà’s time, urban areas have come a long way in how they are planned and improved, but even today disparities are rife in terms of how ‘liveable’ different areas are. “Liveability is something that I’ve been working on most recently,” says Dr Serryn Eagelson, Data, Business and Applications Manager for the Australian Urban Research Infrastructure Network (AURIN).

Eagelson describes her work in finding new datasets as a bit like being a gold prospector. “It encompasses walkability, obesity, clean air, clean water – everything that relates to what you need in order to live well.”

In collaboration with more than 60 institutions and data providers, the $24 million AURIN initiative, funded by the Australian Government and led by The University of Melbourne, tackles liveability and urbanisation using a robust research data approach, providing easy access to over 2,000 datasets organised by geographic areas. AURIN highlights the current state of Australia’s cities and towns and offers the data needed to improve them.

“We have provided AURIN Map to give communities the opportunity to have a look at research output,” says Eagelson. Normally hidden away from public eyes, the information in the AURIN Map can be viewed over the internet and gives communities an unprecedented opportunity to visualise and compare the datasets on urban infrastructure they need to lobby councils and government for improvements in their area.

Recently, AURIN has teamed up with PwC Australia – the largest professional services company in the world – to pool skills, tools and data. “We’re also working with PwC in developing new products,” adds Eagelson. “It’s quite complicated but PwC’s knowledge is giving us new insights into how data can be used for economic policy.”

The Australian National Data Service (ANDS) also has strong links with AURIN, having undertaken a number of joint projects on topics such as how ‘walkable’ neighbourhoods are, which can then be used to plan things like public transport accessibility (even down to where train station entrances and exits should be located); urban employment clusters, which can aid decision-making on the location of businesses; and disaster management, where the collaborators developed a proof-of-concept intelligent Disaster Decision Support System (iDDSS) to provide critical visual information during natural disasters like floods or bushfires.

“I’m probably most excited by a project releasing the National Health Service Directory – a very rich dataset that we’ve never had access to before,” says Eagelson. “It even includes the languages spoken by people who run those services, and that data’s now being used to look at migrants to Australia, where they move from suburb to suburb, and how their special health needs can be best catered for – so this information has a big public health benefit.”

This article was first published by the Australian National Data Service in May 2016. Read the original article here.