Research at the Hasso Plattner Institute for Digital Health at Mount Sinai

Our research combines biomedical and data sciences to develop digital health solutions that help patients and provide scientists with new data technologies. In addition, we develop advanced data engineering and machine-learning approaches that use data in the Mount Sinai Data Warehouse (MSDW).

Research Projects

Mount Sinai COVID Informatics Center: In response to the SARS-CoV-2 global pandemic and health crisis, faculty, and staff from the Hasso Plattner Institute at Mount Sinai joined forces with data scientists, engineers, clinical physicians, and researchers to form the Mount Sinai COVID Informatics Center (MSCIC). Our cross-departmental and cross-institutional center strives to use data, information, and technology to prevent, mitigate, and recover from public health emergencies such as COVID-19. The MSCIC Informatics Crisis Response Platform will include two components: (1) a Critical Informatics Consultation Service to answer pressing clinical questions for Mount Sinai clinicians and researchers within 24 hours, and (2) a Rapid Clinical Intervention Toolkit that facilitates the practice of evidence-based medicine in the Mount Sinai Health System by using electronic medical records to feed insights from data science into the daily workflow. The MSCIC anticipates conceptualizing and developing additional projects in line with its mission. These efforts are already leading to high-impact publications and grants to support this critical programmatic growth, such as Blood Thinners May Improve Survival Among Hospitalized COVID-19 Patients.

Digital Health Discovery Program: Chronic, complex diseases, such as type 2 diabetes or cardiovascular disease, affect millions of individuals, lead to loss of life and health, and cost billions each year. Using mobile and wearable sensor technologies, along with health record data, the DigiMe platform will help investigators at Mount Sinai, as well as patients, learn more about health and wellness by contributing to valuable research through digital clinical trials.

The DigiMe platform will use mobile and web application and wearable technology (such as the Apple watch) to facilitate continuous monitoring of symptoms and biometric measurements outside of periodic traditional medical visits, identify transition points of the disease, build predictive models and provide guidance about management, and help avoid unnecessary health care use. The rapid advancement of mobile smart platforms, biometric monitoring, and wearable, implantable sensors; environmental sensors streaming data indexed by GPS coordinates; and cloud platforms for large-scale data management, analytics, and diagnostics create an unprecedented and scalable opportunity to improve disease management and foster research and clinical trials, that can all take place on the DigiMe platform to advance understanding of complex diseases.

Our first study, recruiting now, focuses on pain, stress and sleep and the interplay of these symptoms in predicting health events. We are working with researchers in many departments to get their digital clinical trials launched on the platform in the upcoming months.

AIR.MS – AI-Ready Mount Sinai Platform: The institute is coordinating development of a multi-modal health data platform named AI-Ready Mount Sinai. This platform aims to link patient data from various clinical departments across the Mount Sinai Health System. These data are currently heavily siloed and often hard to access, resulting in significant missed opportunities to engage in multi-modal biomedical research.

The AIR.MS platform will solve this problem by creating an innovative environment in which highly skilled data scientists can access clinical data such as EHR, imaging, omics, and sensor data. This unified data source will accelerate the advancement of health-care-driven, AI-based solutions.

This platform will be equipped with state-of-the-art computational frameworks and architectures to enable rapid scientific discovery as well as translational applications into clinical settings. This process will be challenging but promises to greatly accelerate the application of clinical data in developing the next generation of health care systems.

Engineering and Research Training

In support of the collaboration between the Mount Sinai Health System and the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, the institute has developed several interdisciplinary research and educational programs. Master’s-level students apply advanced data engineering and machine-learning approaches to interrogate MSDW data in team research projects, supervised jointly by investigators from the Icahn School of Medicine at Mount Sinai and the Hasso Plattner Institute in Potsdam. Projects and papers include:

FIBER: Enabling Flexible Retrieval of Electronic Health Records Data for Clinical Predictive Modeling: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from data scientists and engineers, especially for the data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star-schema data warehouses, an approach often adopted in practice. In this paper, we introduce the FlexIBle EHR Retrieval (FIBER): A Python-based library built on top of a star-schema clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. To illustrate its capabilities, we applied FIBER to a clinical predictive modeling task in the EHR data warehouse of a large health system. As such, FIBER reduces time-to-modeling, helping to streamline the clinical modeling process.

Natural-Language Processing on Clinical Notes for Phenotyping Depression: Current clinical handbooks describe mental disorders mainly based on symptoms. However, many patients with different disorders share the same biological underpinnings and patients of the same diagnostic category can react very differently to the same treatment. Clinical notes of psychiatric patients are a rich resource for gaining a better understand of mental disorders and developing better phenotypes. In this master project, we use natural language processing on clinical notes of EHRs to develop meaningful language-based representations of patients with depression. With unsupervised machine learning methods, we aim to find categories that are closer to underlying biological or neurological mechanisms as well as subcategories that could inform treatment decisions. Supervisors: Hanna Drimalla PhD, Alex Charney, MD, PhD, and Erwin Bottinger, MD.

Prediction of Hypertension Onset by Leveraging EHR Data with Machine Learning: Hypertension is one of the most prevalent medical conditions worldwide. It is also one of the main risk factors for a broad spectrum of cardiovascular diseases. Since high blood pressure can often be prevented by lifestyle interventions, early diagnosis is essential. In this project, we apply state-of-the-art machine learning models to longitudinal clinical information from the MSDW. We use different machine-learning approaches such as LightGBM, random forests, and neural networks to predict, six months in advance, whether a patient will develop hypertension. We also investigate which clinical parameters play major roles in differentiating between hypertensive (cases) and normotensive (controls) patients. We identified the hypertensive patients from the MSDW data, using an already validated phenotyping algorithm. In this presentation, we show the results of each approach in different-sized cohorts and discuss the clinical parameters found to be most significant for predicting hypertension based on our analysis. Supervisors: Suparno Datta, M.Sc, PhD candidate, Ariane Sasso, M.Sc, PhD candidate, Girish Nadkarni, MD, and Erwin Bottinger, MD.

Phenotyping and Subgroup Identification in a Non-Specific Back Pain Patient Cohort: One of the most common health problems affecting humans is non-specific back pain, which is the leading cause for absence from work and disability worldwide. Since no underlying pathology can be identified, this condition holds a huge potential for data-driven knowledge discovery approaches. At the same time, large quantities of real-world medical information stored in EHR databases promises to reveal new insights into epidemiology and pathophysiology of, and therapy for, of a variety of diseases.

In this master project around the MSDW, we aim to identify clinically relevant yet undiscovered subgroups of non-specific back pain patients and define a robust phenotyping algorithm for future EHR-based research. With no such algorithm available, we develop criteria to define different subsets of back pain patients from the data contained in EHRs and we compare the epidemiological features from the resulting datasets with other previously described cohorts. Lastly, we describe how we feed the data from all patients with non-specific back pain into an unsupervised clustering approach, yielding the first clinical data-driven subclassification of non-specific back pain. Supervisors: Jan-Philipp Sachs MD, PhD candidate, Riccardo Miotto, PhD, and Erwin Bottinger, MD.

Process Mining in Personalized Medicine: Business process technology aims to formally describe and analyze real-world business process management problems. Using process models, we can make these approaches accessible to multiple stakeholders. We use process mining to discover and model underlying processes from historical data, such as electronic health records. Based on these analyses, we consider additional aspects such as performance analysis, bottlenecks, predictions about the outcome of a process, and the use of resources. We analyzed the patient flows of a lower back pain sub-cohort, containing about 85,000 patients, from the Mount Sinai Data Warehouse. Furthermore, we compared what we learned with existing clinical guidelines. We will present the results of our project by focusing on data extraction, transformation, and process modeling. Moreover, we will address the challenges of applying process mining to big data in health care. Supervisors: Simon Remy, M.Sc, PhD candidate, Jan-Philipp Sachs MD, PhD candidate, Riccardo Miotto, PhD, Ben Glicksberg, PhD, and Mathias Weske, PhD.