DATA SCIENCE AND EVALUATION
Focuses on maximizing the power of data for population and health research in Africa through the creation of platforms for Africa-led data sharing, data custody, and application of state-of-the-art big data analytics and artificial intelligence to foster advances in health and wellbeing in Africa.
OVERVIEW
Our work in this area leverages advances in platform development to create robust data systems that ensure data are shared, governed and analyzed with novel methods. The Data Science program leverages internally and externally generated “big data” to explore patterns and predictions using data science, artificial intelligence tools and modelling approaches to inform population health.
Units and working groups
- Data Platforms and Systems. The team focuses on creation of platforms and systems that support the data value chain. Current and planned platforms include;
- Data Science and Sharing Platform (DASSA). This is a data sharing platform with interfaces that support stories on data sharing, information on legal policies and frameworks for data protection in various African countries, provides modules for data sharing and collates data from various sources – including internally generated research datasets at the center.
- No-Code Machine Learning Platform. This supports codeless machine learning algorithms that are easily deployed by researchers who are not necessarily data science professionals. The graphical user face allows use of research datasets uploaded to the platform and real-time predictive analytics are generated with accompanying interpretations for the user.
- Data Governance. The data governance team works closely with the data synergy team to develop a data governance framework for APHRC. Additional work is the creation of a data governance curriculum on data governance, data anonymization, privacy-preserving technologies and responsible data use. The Data Science team works closely with the RRCS to deliver the proposed training.
- Data Harmonization and FAIR. The team of data documentationists and data scientists create data pipelines for various use cases and support on premise and cloud based data analysis through a federated approach. The team uses the Observational Medical Outcomes Partnership (OMOP) Common Data Model – a standardized data model for health data with internationally recognized vocabularies. The platform is used to harmonize data generated internally and externally through the Center’s partnership projects across Africa and beyond. In addition, metadata for APHRC research datasets is indexed and made machine searchable using tools such as Schema.org, to increase visibility and allow global sharing of data.
- Data Analytics and Evaluation. A team of experienced data scientists, statisticians,s and mathematical modelers support data analytical support for “big-data” driven projects. The team uses machine learning techniques and new tools such as Generative Artificial Intelligence to develop robust outputs that inform decision-making and impact lives through research.
INSPIRE Network
The Implementation Network for Sharing Population Information from Research Entities (INSPIRE) is hosted by the Data Science Program. INSPIRE was birthed in 2019 as a network of Health and Demographic Surveillance Sites (HDSS) in East Africa. The vision has since changed and now hosts about 20 HDSS sites in Eastern Africa (Ethiopia, Kenya, Tanzania, Uganda), Western Africa (Senegal, Burkina Faso), Southern (Malawi) Africa. The INSPIRE secretariat provides;
- An annual general meeting to discuss value addition and collaboration among HDSS
- Period hybrid training in data harmonization for data managers at the respective sites
- Promotes federated data-sharing models for collaborative and joining analyses
- Addressed recurrent challenges faced by HDSS e.g. record linkage
- A platform for joint grant application across the network members