Extracting useful healthcare knowledge from Big Data can be considered as a processing pipeline that involves multiple distinct configuration stages to achieve full utilization. Each stage faces several specific challenges as follows:
- Data aggregation challenges:
Big Data research projects usually involve multiple organizations, different geographic locations and large numbers of researchers. Therefore, data exchange between groups is very difficult when using this method. In addition, the need to ensure patient data security, confidentiality and privacy based on mandated privacy to the general public by the privacy commissioner. There are many barriers to health Big Dara aggregation. With large datasets, it is all too easy to unveil significant value by making information transparent. Thus, our ability to protect individual privacy in the era of Big Data is limited.
- Data maintenance challenges:
Since Big Data involves large collections of datasets, it is very difficult to efficiently store and maintain the data in a single hard drive using traditional data management systems such as relational databases. Also, it is a heavy IT burden (cost and time) for small organizations or labs to manage.
- Data integration challenges:
This involves integrating and transforming data into an appropriate format for subsequent data analysis. However, Big Data in healthcare are unbelievably large, distributed, unstructured and heterogeneous, making integration and transformation all the more problematic. Integrating unstructured data is a major challenge for BDA. With structured EHR data integration there are also many integration issues.
- Data analytic challenges:
- Complexity of the analysis – For some analysis algorithms, the computing time increases dramatically even with small amounts of data growth. For example, Bayesian Network is a popular algorithm for modeling knowledge in computational biology and bioinformatics. However, within the computation complexity of the Bayesian Network, the computing time for finding the best network also increases exponentially as the number of records rises.
- Parallelization of computing model – For those computationally intense problems, we can parallelize the analysis so that the problem can be solved by distributing tasks over many computers. However, if we cannot parallelize the analytic algorithm, it will be very difficult for those massive parallel-processing (MPP) tools to perform an efficient computation.
- Pattern interpretation/application challenges:
Having the ability to analyze Big Data is limited in its value if decision makers cannot understand the discovered patterns. Unfortunately, due to the complex nature of the analytics in healthcare, presentation of the results, data visualization, and its interpretation by non-technical domain experts are a major challenge.