Cyber Security
Cloud-based computing – Data collection and forensic investigation challenges
Published
3 years agoon
By
GFiuui45fgThe year is 2007, the iPhone 1 has just been released, and many of us are still bound to desktop computers on office desks.
Over the course of the next 13 years, our ability to stay connected with others and work while “on the go” gets easier, with smartphones and laptops becoming the norm. Not long after COVID-19 spawned a series of stay-at-home orders, the number of people working from home in the USA grew from 22% to 42%. Today, organizations are trying to understand what the ‘new normal’ for communications and data security will be.
To support this new paradigm shift towards remote working, businesses adopted new technologies that supported their remote workforce. Most of us are familiar with video conferencing; however, several other cloud-based collaborative technologies have risen to the fore, such as Slack, Microsoft Teams and Telegram. These technologies reduce the barrier of distance and enable large groups of people to collaborate seamlessly and efficiently across distances from anywhere.
In addition to these applications, the increasing popularity of services such as Google Drive, OneDrive and Dropbox allows the average computer user to store vast amounts of information remotely.
Challenges and Solutions
While cloud-based technologies such as these are designed to facilitate communication and provide businesses with a competitive advantage, they pose unique challenges in the field of data collection and forensic investigations. These challenges will keep increasing as more of our workforce operates virtually.
Specifically, these challenges are:
1. The information security challenge, resulting from enhanced security access requirements by providers.
A renewed focus on information security has increased the difficulty associated with accessing any data contained within cloud-based storage platforms, meaning the much greater involvement of information security teams to facilitate data collection.
The continued emergence of ransomware and other security attacks (e.g., malware attacks increased by 358% in 2020) has necessitated the need for information security teams to become increasingly more vigilant with respect to how they secure their organization’s data. This flow-on effect means an increase in the prevalence of enhanced security access controls, for example, restricting the geographic regions that can access data and requiring two-factor authentication as standard.
Issues concerning access can greatly delay a data collection exercise and even derail a forensic collection entirely if information security risks cannot be addressed. This also applies to situations where clandestine data collection is required, but two-factor authentication is in place. This means that a suitable strategy explaining the need for the cooperation of the custodian may need to be developed. It is much more common in the current environment for information security controls to add substantially to the time required to perform data collection.
2. The document family challenge, arising from changes made in the way data is structured and presented to an end-user.
It is not uncommon for modern cloud-based applications to segregate communication content, such as messages and attachments, making it more difficult to collect and subsequently piece together message content.
Prior to the advent of cloud-based collaboration tools, along with technologies such as O365, the collection and analysis of electronic mail was a relatively simple affair. An email, along with its associated attachments, was “wrapped up” together and stored in container files, meaning collection was relatively simple.
This is no longer necessarily the case, and it is not uncommon for cloud-based collaboration tools to store messages and attachments separately, linking them together virtually. Forensic tools are only just now coming to terms with this new way of structuring data, meaning that while attachments and messages could be collected, marrying those two sources together accurately has proven time-consuming and technically challenging.
Data stored in this fashion must be collected such that those virtual relationships remain intact, which can mean an increased data collection timeframe is required.
3. The data format challenge occurring due to the rise of more disparate ways in which applications store their message content.
Data stored within cloud-based collaboration tools has necessitated the need for custom parsers to be written to make sense of any data collected from these applications.
Cloud-based business applications are designed to meet the needs of business, and as such, vendors will ensure that any data stored by those applications is stored in a format primarily designed to support that activity. A common format for data export and storage from these platforms is the JSON (JavaScript Object Notation) format, which uses human-readable text to store data and other objects.
While the file format itself is well documented, a vendor can tailor how their data is stored, meaning that there is no standard way of decoding data stored within a JSON file. This means that while a forensic investigator can easily collect JSON data, doing something meaningful with it is something very different. The speed at which cloud-based technologies have exploded onto the market has meant that forensic tools designed to parse this information are only just catching up.
When attempting to collect data from these platforms, it is important to consider not only how the data is collected, but how the data is to be processed to ensure that an appropriate parser exists or determine if a bespoke solution needs to be developed to address the gap in forensic tool capabilities.
4. The data volume challenge, resulting from more people working remotely and the rise in popularity of collaborative tools.
With chat threads consisting of thousands of participants and messages numbering in their tens of millions, the ability to collect and meaningfully cull and search this type of data becomes problematic. Document “versioning” is also a substantial contributor to the data volume issue. There could conceivably be thousands, or even tens of thousands, of slightly different versions of just a single document.
Our ability to store vast amounts of data contributes to this challenge, and knowing what should be collected and what needs to be collected are of paramount importance. Is it sufficient to collect the last version of a document, or are all document versions required? If the latter, then the impact on the time required to collect data can be dramatic, taking a process that would otherwise finish in one day to a process that might still be ongoing after several weeks.
There is no substitute for testing in these circumstances. Performing a collection on a smaller sub-set of data using both methodologies will ensure a full understanding of the impact of a decision that could have dramatic ramifications.
5. The behavioral challenge, shifting due to changes in how we communicate our thoughts and feeling to others digitally.
Rather than collecting our conversations into discrete email messages, in the modern business environment, people communicate more and more via a stream of chat messages. Bringing these chat messages together into a coherent message thread is a complicated technical challenge.
With conversations moving from discrete email messages to a steady stream of “chats,” the line between one conversation and the next becomes increasingly difficult to determine. In addition, cloud-based messaging applications commonly store each “chat” item as an individual discrete message. This presents two issues, determining where conversations start and end and stitching together individual messages into a single meaningful document that retains context and meaning.
This has necessitated arbitrary decisions being made at the point of collection, for example, segregating chat streams into 24-hour chunks. It has also necessitated the development of both commercial and bespoke tools to enable the collection of such data so that it can be pieced together into a meaningful form.
6. The data privacy challenge, resulting from a continued focus by Governments to further protect the privacy of their citizens
Data privacy is now being treated as a fundamental right of the citizenry, as demonstrated through GDPR and other similar legal frameworks regarding data access and transfer.
Cloud-based computing, by its very design, crosses international boundaries. Any data collection exercise must consider the impact of various local laws on the ability to collect data, what data can be collected, what needs to be done with the data when it is collected, and who needs to be notified when the data is collected.
Of all the challenges outlined above, this challenge is critical due to the potential legal impact on all parties involved should any laws be breached in the process of collecting data.
Collecting Data and Forensics
Cloud-based technologies, and our usage of those technologies, were already on the increase before the global pandemic accelerated their uptake exponentially. As a result, the impact on the industry’s ability to collect data from these sources and incorporate it as part of the forensic investigation was more dramatic and swifter than we expected. However, the industry is rising to the challenge, and collection technologies and methodologies are adapting to the new “normal.” However, there is still a gap between the technology used to collect and analyze data, and the expectations of business and legal teams who are more accustomed to traditional communication mediums, such as email. As a result, there needs to be a continued acknowledgment of the ongoing effect of these challenges and how they will impact any ongoing or future data collection and forensic investigations along with any subsequent legal proceedings.