WHEN A POWERFUL magnitude 7 earthquake devastated the Republic of Haiti on 12 January 2010, more than 200,000 people were killed. Around 3 million people were affected by the earthquake and its aftershocks, which destroyed 250,000 homes and 30,000 commercial buildings.
Around 630,000 people left the chaos of Haiti’s capital, Port-au-Prince, in search of shelter, water and sanitation. Many of these people used Haiti’s four main mobile phone providers, via its 6 million mobile phone lines, to call friends or relatives in rural areas. Those calls enabled Swedish medical researchers at the Karolinska Institutet in Stockholm to track their movements and to identify areas at risk of potential cholera outbreaks.
The researchers worked with Haiti’s largest mobile phone operator, Digicel, to analyse the call history of 2 million mobile phone users, before and after the earthquake. The results, published in PLoS Medicine and Proceedings of the National Academy of Sciences in 2011 and 2012 respectively, found that “people seemed to have travelled to where they had significant social bonds and support”. More specifically, most Haitians fleeing Port-au-Prince went to the same locations where they had spent Christmas and New Year. The study showed that large-scale movements after earthquakes and other disasters are not chaotic, but often highly predictable, and could be used to improve the efficiency of aid distribution.
The Haiti research is an example of how big data analysis can be used for humanitarian purposes or ‘data philanthropy’ in developing countries. A May 2014 report by the Bill and Melinda Gates Foundation suggests that mobile phone data is “one of the only large-scale, digital data sources that touch large portions of low-income populations,” and if analysed “under proper protections and anonymisation protocols, it can be used to enhance the lives of poor people around the world”.

MOBILE PHONES ARE just one source of big data in a world where global satellite navigation, online transactions, sensors, digital closed-circuit cameras, radar monitoring and aerial surveys using pre-programmed drones generate hundreds of exabytes (billions of gigabytes) of data a year. In 2010, The Economist published a series of features on ‘the data deluge’, warning that keeping up with this flood was difficult enough, but “analysing it, to spot patterns and extract useful information, is harder still”. A 2011 report by the McKinsey Global Institute estimated that the volume of global data was predicted to grow by 40% a year, but global spending on data information management was growing at just 5% annually. Technology researcher Gartner estimated that big data analytics drove US$28 billion of global IT spending in 2012, and predict expenditure will exceed US$230 billion in 2016.
In Australia, the Data to Decisions Cooperative Research Centre (D2D CRC) has received $25 million from the federal government and $62.5 million from industry and research participants to address big data challenges. It follows a review of the data analytics and management capacity of Australia’s public service, including defence and federal law enforcement agencies.
Based in Adelaide, the D2D CRC will focus on three research areas: data storage and management, analytics and decision support, and law and policy for big data analysis including issues such as privacy. Participants include Deakin University, the Australian Federal Police, the Attorney-General’s Department, the Department of Defence, the University of South Australia, the University of Adelaide, UNSW Australia, BAE Systems and SAS. The CRC will also develop research links with leading US universities and data analysts.
“We’re dealing with vast volumes of raw, unstructured data. For defence and national security, it’s like looking for the proverbial needle in a haystack.”
DR SANJAY MAZUMDAR, the CRC’s chief executive, says the bid to establish the $88 million research venture arose from discussions about future challenges to Australia’s national security and a shortage of skills in data intelligence applications and analytics. Australia urgently needs to build a skilled workforce to manage, extract and analyse data. The CRC aims to produce 48 PhD students across areas that include health care, IT, government services, law, manufacturing and defence intelligence systems. It will also train 1000 data scientists through its Education and Training Program, and work with universities to build on existing degrees in business and data analytics.
Australia’s defence and national security sectors face “the most imminent and complex” challenges from the global data deluge, Mazumdar says. British mathematician and global data analytics expert Clive Humby has described data as “the new oil”, but Mazumdar points out that, like oil, data needs to be processed to extract maximum benefit.
“We’re dealing with vast volumes of raw, unstructured data. For defence and national security agencies, it’s like looking for the proverbial needle in a haystack. In a time-critical situation, you need to be able to extract actionable intelligence from that data, and to do that, you need advanced data analysis programs that can process and filter that data quickly, accurately and efficiently,” he says.
Even when massive datasets have been processed and analysed, there’s still a need to cross-reference and present findings as visualisations – tables, charts, graphs, keywords and heat maps – that condense the data to manageable and easily assimilated information. Otherwise, Mazumdar says, we may be “drowning in data but starved for information”.
He explains that it’s not only defence and national security agencies that will benefit from expanding Australia’s skills in data analytics. “In mining, for example, the biggest costs are around exploratory drilling to obtain samples for analysis. There’s a possibility that geophysical data from satellite images could be used to pinpoint where deposits are likely to occur, and that could be immensely cost-saving.”
Mazumdar says the CRC will provide a variety of big data users – from government departments and utilities to universities and private industry – with the “tools, techniques and workforce to unlock the value of their data to make more informed and efficient decisions”.
Advanced machine learning and data retrieval systems are critical research areas for big data management. Mazumdar uses the example of image extraction during an attempted terrorist attack, when police and defence intelligence may need to analyse hours, days or even weeks of footage from closed-circuit cameras.
“If you can teach a computer to look for certain combinations of things in that high-volume data stream, you will get a faster result that will inform real-time decisions a lot more quickly,” he says.
The CRC will develop next-generation data storage and large-scale processing software from commercial open-source data management systems, such as Hadoop. Mazumdar says new systems of data mining and machine learning could reduce the time required to analyse high-volume data streams, including satellite imagery. It’s not a question of automating decisions via a machine, but of using data analytics to strip out non-essentials and collate relevant material.
“The human eye and brain are very good at processing complex information from images, but they wouldn’t cope with such a high volume of material. If we can teach a computer to look for certain combinations of things, like the shape of an aircraft, for example, the machine can sort the images in the data pipeline into a smaller, more manageable dataset.”
THE RIGHT TO privacy and debate over who owns data generated by social media, mobile phones, ATMs and iPad apps, is a hotly-contested topic.
A data management issues paper was developed by the Australian Government Information Management Office (AGIMO) to identify and discuss privacy and security implications around the use of government agency data. The AGIMO estimates that 90% of data in the world today was generated over the past two years alone, and that this amount of data will be 44 times greater by 2020. But who should have right to use such data and under what circumstances, and what controls are appropriate to place on its use?
The AGIMO argues that private companies such as banks, online retailers, insurance companies and social media sites, including Twitter and Facebook, harvest huge volumes of customer data, which is analysed and used to create new client services. Government departments and agencies could also use data analytics to improve services, but they’re bound by a range of legislative controls relating to privacy, security and public trust. In Australia, they must obtain and use information according to the Privacy Act, the Telecommunications Act, Freedom of Information laws and others.
The Gates Foundation report gives an example of mobile phone data use that generated controversy. When health researchers at Harvard University obtained Kenyan mobile records to track the spread of malaria in 2012, it provoked a storm of protest from people who had unknowingly contributed to the study. The researchers had obtained anonymised records for every call and text message sent by 15 million Kenyan mobile phone subscribers over a year, and used the data to identify regions where malaria infection had originated to target medical aid more effectively.
Despite the humanitarian nature of the research and reassurances that callers could not be identified from data provided, Kenyan media claimed the study had breached privacy. The Gates Foundation report says the incident shows that “even with the best of intentions and adherence to rules”, researchers need to consider privacy issues when collecting data.
Professor Louis De Koker, of Deakin University’s School of Law, was founding director of the Centre for the Study of Economic Crime at the University of Johannesburg in South Africa. He will lead the D2D CRC’s Law and Policy program, which combines senior law and socio-legal researchers from the Deakin School of Law and UNSW Australia Law.
“Big data analysis challenges existing privacy principles because our current framework is built around the notion that you own your data and anyone who wants to use it needs your consent to do so,” he says. “It also assumes that data can be effectively de-identified, whereas data analytics can now enable re-identification.”
“In many cases, the data you generate from, for example, activity on social media sites or Google searches, may be analysed and produce more data and a deeper understanding about you or about communities of which you are a member. In addition, that data would often be stored in another country. What are the laws that apply to that secondary data, and how do you enforce any breaches of rights that you may have in relation to that data? How do we harness such data to improve national security while protecting Australians from abuse of their data? Those are the kinds of policy questions that need to be looked at.”
He says privacy concerns reflect a substantial increase in the volume of data being collected and social concerns and fears around being spied on.
“Unlike the days when you filled in a form and physically handed it to someone, people often don’t know what kind of data is being collected, how and when it’s being collected, who is using it to draw conclusions about you or which decisions by government or private companies relating to you are affected,” he says.
“These days, we take the presence of surveillance cameras very much for granted because they’re everywhere – they’re in shops, airports and at ATMs. But what are the implications when we combine these data sources with sophisticated data analysis? How can we harness the benefits of such data and protect society against abuse?”
In 2013, Shoalhaven City Council installed CCTV cameras in the NSW South Coast town of Nowra as part of a crime prevention program. The surveillance cameras were installed in public places, including shops, parking lots and parks. But a resident challenged their use and argued before the Administrative Decisions Tribunal that it was not the council’s role to collect evidence for the purpose of prosecuting crime. The tribunal upheld the resident’s complaint, ruling that council signage near the camera did not adequately inform people about privacy implications. It also ruled that the council had not established that filming people was “reasonably necessary” to prevent crime.
De Koker says there’s also debate around the adequacy of the protection afforded by giving people notice and gaining their consent to collect and use data, for example when customers sign online agreements relating to social media, software downloads and apps. A report to President Barack Obama in May 2014 on big data and privacy by the President’s Council of Advisors on Science and Technology stated that “each individual app, program or web service” is legally required to ask people to give consent for data collection practices. “Only in some fantasy world do users actually read these notices and understand their implications before clicking to indicate their consent,” the report says.
Mazumdar and De Koker say the D2D CRC will explore opportunities and challenges posed by high-volume data harvesting and analytics in consultation with legal and national security experts.
“All governments are grappling with this issue…There is a lot of good that can come from big data analysis, but we need to balance our expectations and concerns,” De Koker says.
“In the modern world, one of the biggest threats to both national security and personal privacy is a person sitting in a room with a laptop.”
THE ADVISORY COUNCIL’S report to President Obama suggests future generations, who will have grown up with digital technologies, “may see little threat in scenarios that individuals today would find threatening”. It describes a future in which “digital assistants”, in the form of data collection cameras, film a woman packing her suitcase for a business trip. The bag is placed outside for pick-up, with her digital assistants sending the delivery instructions. The suitcase won’t be stolen because a streetlight camera is watching it and every item inside it has a tiny electronic tag that can be tracked and found within minutes.
Her world possibly “seems creepy to us”, but she has “accepted a different balance among the public goods of convenience, privacy and security than most people would today,” the report says.
“In the modern world, one of the biggest threats to both national security and personal privacy is a person sitting in a room with a laptop,” says Mazumdar. “That threat will only grow as the world becomes increasingly reliant on digital technology. We’re already detecting sophisticated ways to hide data and connections online. We need to improve our national capacity to detect and respond to that hidden information, but also to ensure control of that capacity, to respect and protect the rights of users online.”