RecSys.Scifi

about

In this tutorial we will approach state-of-the-art Recommender Systems in Scientific Fields, explain what is Named Entity Recognition/Linking in research literature, and to demonstrate how to create a dataset for recommending drugs and diseases through research literature related to COVID-19. Our goal is to spread the use of LIBRETTI methodology in order to help in the development of recommender algorithms in scientific fields.

Recommender systems (RS) have been successfully explored in a vast number of domains, e.g. movies and tv shows, music, or e-commerce. In these domains we have a large number of datasets freely available for testing and evaluating new recommender algorithms. For example, Movielens and Netflix datasets for movies, Spotify for music, and Amazon for e-commerce, which translates into a large number of algorithms applied to these fields.

In scientific fields, such as Health and Chemistry, standard and open access datasets with the information about the preferences of the users are scarce.

First, it is important to understand the application domain, i.e. “what the recommended item is”. Second, who are the end users: researchers, pharmacists, clinicians or policy makers. Third, the availability of data. Thus, if we wish to develop an algorithm for recommending scientific items, we do not have access to datasets with information about the past preferences of a group of users. Given this limitation, we developed a methodology (called LIBRETTI - LIterature Based RecommEndaTion of scienTific Items) whose goal is the creation of <user, item, rating> datasets, related with scientific fields.

The datasets are created based on the major resource of knowledge that Science has: scientific literature.

We consider the users as the authors of the publications, the items as the scientific entities (for example chemical compounds or diseases), and the ratings as the number of publications an author wrote about an entity.

The output of this tutorial will be an open source recommendation dataset for diseases and drugs related to COVID-19, which may be used for training new recommendation algorithms. You can find an example of a dataset created using LIBRETTI.

Program

Aug. 15, 2021 4:00 AM - 7:00 AM (Singapore Time)

Part I

Introduction

Introduction to recommender systems
Scientific recommender systems
Introduction to Named Entity Recognition (NER) and Named Entity Linking (NEL)
LIBRETTI pipeline
Part II

Creating a recommendation dataset through scientific literature

Retrieve the research articles related to COVID-19
Named Entity Recognition (NER) and Linking (NEL)
Creating the recommendation dataset
Part III

Final discussion

Open discussion
END!

Aug 14-18, 2021 Schedule: Aug. 15, 2021 4:00 AM - 7:00 AM (Singapore Time)

about

The output of this tutorial will be an open source recommendation dataset for diseases and drugs related to COVID-19, which may be used for training new recommendation algorithms. You can find an example of a dataset created using LIBRETTI.

schedule

Singapore-time

Sunday, Aug 15th

Program

Aug. 15, 2021 4:00 AM - 7:00 AM (Singapore Time)

Part I

Introduction

Part II

Creating a recommendation dataset through scientific literature

Part III

Final discussion

END!

Slides

Download all resources on the GitHub repository

Our Amazing Team

BioTM, LASIGE, Faculdade de Ciências da Universidade de Lisboa, Portugal ― Biomedical Text Mining Team

Márcia Barros

Francisco Couto

Matilde Pato

Pedro Ruas

acknowledgements

Contact Us