Welcome To RecSys.Scifi!
Recommender Systems Datasets in Scientific Fields
Tutorial at ACM Conference on Knowledge Discovery and Data Mining (KDD)

Aug 14-18, 2021 Schedule: Aug. 15, 2021 4:00 AM - 7:00 AM (Singapore Time)

More

about

In this tutorial we will approach state-of-the-art Recommender Systems in Scientific Fields, explain what is Named Entity Recognition/Linking in research literature, and to demonstrate how to create a dataset for recommending drugs and diseases through research literature related to COVID-19. Our goal is to spread the use of LIBRETTI methodology in order to help in the development of recommender algorithms in scientific fields.

Recommender systems (RS) have been successfully explored in a vast number of domains, e.g. movies and tv shows, music, or e-commerce. In these domains we have a large number of datasets freely available for testing and evaluating new recommender algorithms. For example, Movielens and Netflix datasets for movies, Spotify for music, and Amazon for e-commerce, which translates into a large number of algorithms applied to these fields.

In scientific fields, such as Health and Chemistry, standard and open access datasets with the information about the preferences of the users are scarce.

First, it is important to understand the application domain, i.e. “what the recommended item is”. Second, who are the end users: researchers, pharmacists, clinicians or policy makers. Third, the availability of data. Thus, if we wish to develop an algorithm for recommending scientific items, we do not have access to datasets with information about the past preferences of a group of users. Given this limitation, we developed a methodology (called LIBRETTI - LIterature Based RecommEndaTion of scienTific Items) whose goal is the creation of <user, item, rating> datasets, related with scientific fields.

The datasets are created based on the major resource of knowledge that Science has: scientific literature.

We consider the users as the authors of the publications, the items as the scientific entities (for example chemical compounds or diseases), and the ratings as the number of publications an author wrote about an entity.


The output of this tutorial will be an open source recommendation dataset for diseases and drugs related to COVID-19, which may be used for training new recommendation algorithms. You can find an example of a dataset created using LIBRETTI.

schedule

Singapore-time

Sunday, Aug 15th

  • 4:00 - 5:10 - Part I: Introduction

  • 5:10 - 5:20- Coffee Break

  • 5:20 - 6:50- Part II: Creating recommendation dataset through scientific literature

  • 6:50 - 7:00 - Part III: Discussion

Program

Aug. 15, 2021 4:00 AM - 7:00 AM (Singapore Time)

Slides

Download all resources on the GitHub repository

It appears you don't have a PDF plugin for this browser. No biggie... you can click here to download the PDF file.

...

Márcia Barros

Biomedical Sciences

...

Francisco Couto

Informatics

...

Matilde Pato

Biomedical Engineering

...

Pedro Ruas

Bioinformatics

acknowledgements

This work was supported by the Fundação para a Ciência e Tecnologia (FCT), under LASIGE Strategic Project UIDB/00408/2020, FCT funded project PTDC/CCI-BIO/28685/2017 and PhD Scholarship SFRH/BD/128840/2017.

...
...
...
...

Contact Us