Name: Enhancing Named Entity Recognition (NER) in Biomedical Texts: BIOBERT on CORD19 Dataset
Start: 2024-08-08T09:30:00+0530
End: 2024-08-08T11:30:00+0530

Thursday August 8, 2024 9:30am - 11:30am IST

Virtual Room D

Authors - Saripudi Suneetha, Jarubula Ramu, Neerukonda Kanthi Priyadarsini, Thulasi Bikku
Abstract - The CORD-19 data and Bio BERT-NER (Bidirectional Encoder Representations from Transformers for Named Entity Recognition) are strong natural language processing and biological research approaches. Bio BERT-NER provides scientific articles on COVID-19 and associated historical coronavirus re-search. CORD-19 allows for text mining and data retrieval system development using its extensive metadata and structured full-text publications. Applying the BioBERT model to the CORD-19 data to recognise named entities (NER). An adaptation of the BERT concept is tailored to deal with biomedical works. Named entities (people, places, things, etc.), biomedical entities (genes, proteins, illnesses, etc.), and other types of textual entities are recognised and placed into predetermined categories in NER. Since its release, The CORD-19 dataset has been used as the foundation for several text analysis and discovery algorithms focused on COVID-19. In this study, we present a comprehensive approach utilizing the BioBERT model for NER on the CORD-19 dataset, which contains a vast collection of scholarly articles related to COVID-19. The workflow begins with data preprocessing, including handling missing values, dropping low-frequency tags, and tokenizing the text using the BioBERT tokenizer. The tokenized sequences are then encoded into numerical representations using BioBERT's vocabulary. A custom NER model is constructed using PyTorch, with the pre-trained BioBERT weights loaded for transfer learning. This article provides an in-depth account of creating a dataset, focusing on the difficulties and significant choices made during its creation. This research will facilitate the collaboration between the scientific computing community, biomedical professionals, and policymakers in pursuing efficient therapies and management strategies for COVID-19.

Paper Presenter

Thulasi Bikku

India

Thursday August 8, 2024 9:30am - 11:30am IST
Virtual Room D Goa, India

Virtual Room 4D, Virtual Room D

Host Organization Global Knowledge Research Foundation

9th International Conference on ICT for Sustainable Development

Thulasi Bikku

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!