Affiliations
Friedrich Schiller University Jena
Websites
Projects
- iAnswer – Unveiling iDiv Knowledge Can Large Language Models and Knowledge Graphs combined provide answers to peoples’ questions?
- EcoWeaver – Mapping Evidence to Theory in Ecology – From Ecological Knowledge to Restoration Action (https://ecoweaver.hi-knowledge.org/)
Research Focus
In the iAnswer project, we aim to integrate the heaps of knowledge iDiv has amassed over its lifetime and make it available for people at large to support the generation of new insights for biodiversity research. To this end, we aim to create Knowledge Graphs (KGs) for data sources from iDiv’s PlantHub and plant phenomics and genomics data from IPB Halle. We develop and apply new KG modeling and construction techniques for higher quality KGs that align with the FAIR and CARE principles.
Simplifying the interaction between users and KGs has been a central research question over the decades, as it represents one of the largest hurdles for the wide adoption of KGs in specialized domains due to the steep learning curve faced when learning SPARQL, the query language for KGs. iAnswer contributes to this issue as its main goal is the development of a question answering (QA) system that allows users of different levels of domain expertise to interact with the KGs using natural language. The central research question for this QA system is concerned with how information from a KG can be fed to an LLM to enhance its QA capabilities. This leaves potential for investigation of RAG techniques, retrieval methods that leverage schema-level information, purpose driven KG modeling for its QA application, searchable natural language embeddings inside the KG, and many more novel research ideas.
Education
- Sc. in Computer Science at Uni Jena (2022–2025)
- Sc. in Computer Science at Uni Jena (2017–2021)
Selected Publications
- Al Mustafa, T., 2025. From metadata to meaning: a semantic units knowledge graph for the biodiversity exploratories. Jena. https://doi.org/10.22032/dbt.67767
- Marc Felix Brinner, Tarek Al Mustafa, and Sina Zarrieß. 2025. Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22740–22754, Suzhou, China. Association for Computational Linguistics. 18653/v1/2025.findings-emnlp.1238