Systematic Review of Public Health Data Lake Architectures Supporting Real-Time Analytics and Decision-Making
Abstract
Public health decision-making increasingly relies on timely access to large, heterogeneous datasets, including electronic health records, laboratory results, syndromic surveillance, environmental monitoring, and social determinants of health. Traditional relational databases often struggle to handle the volume, velocity, and variety of modern public health data, limiting the ability to perform real-time analytics for outbreak detection, resource allocation, and policy planning. Data lakes have emerged as a scalable solution, providing centralized repositories capable of storing structured, semi-structured, and unstructured data while supporting advanced analytics and machine learning applications. This systematic review examines the current state of public health data lake architectures, focusing on their design, operational features, and capacity to support real-time analytics and evidence-based decision-making. A comprehensive literature search was conducted across scientific databases and grey literature to identify studies reporting on public health data lake implementations, integration frameworks, and analytic capabilities. Key dimensions analyzed include data ingestion mechanisms, storage models, interoperability standards, metadata management, governance frameworks, security and privacy measures, and analytic tools. Findings indicate that successful public health data lakes integrate multi-source datasets using standardized schemas and ontologies, enabling seamless data harmonization and real-time access. Advanced processing pipelines, including stream processing and event-driven architectures, facilitate continuous data updates and near real-time insights. Governance and security frameworks are critical for ensuring data quality, interoperability, and compliance with privacy regulations, particularly in sensitive domains such as patient-level health records. Additionally, the integration of machine learning and visualization tools enhances predictive modeling, anomaly detection, and operational decision support. This review highlights best practices in the design and deployment of public health data lakes, emphasizing the importance of scalability, flexibility, and governance. By consolidating diverse datasets into a unified, analyzable repository, public health data lakes enable timely, evidence-based decision-making, strengthen outbreak detection and response capabilities, and support resource optimization. The findings underscore the potential of data lake architectures as foundational infrastructure for modern, data-driven public health systems.
How to Cite This Article
Chiamaka Grace Ohanebo (2022). Systematic Review of Public Health Data Lake Architectures Supporting Real-Time Analytics and Decision-Making . International Journal of Multidisciplinary Evolutionary Research (IJMER), 3(2), 221-231. DOI: https://doi.org/10.54660/IJMER.2022.3.2.221-231