How Effective is Natural Language Processing in Clinical Trials?


Image Courtesy of epmmagazine via

Igor Kruglyak, senior advisor at the global IT service provider Avenga and Michael DePalma, founder and president of digital specialists Pensare, LLC, examine the use of natural language processing (NLP) for investigator recruitment acceleration. 

NLP is one of the fastest adopted business technologies in the world, only two years after Google first released its pre-trained Bidirectional Encoder Representations from Transformers (BERT). BERT provides a state-of-the-art output on 11 NLP tasks, and has a deeper sense of language context than any other language model developed before. In the last three years, NLP has made more progress than any other subfield of AI and estimates predict that the worldwide NLP market size will reach $43 billion by 2025, compared to $11 billion in 2019. 

Simply put, NLP enables computers to analyse written or spoken human language, to extract its meaning and to obtain insights from these data. The pharmaceutical industry has started utilising the technology, for example, to analyse medical data, to ensure pharmacovigilance and to enhance medical care with health assistants.

Patient Recruitment in Clinical Trials

Historically, one of the most acute issues that hamper the success of clinical trials, is inefficient recruitment. Globally, 86% of clinical trials fail to recruit patients on time. Although the reasons for such high failure rates are diverse and complex, insufficient resources and the time-consuming nature of the process are considered among the most significant negative-impact factors.

According to Tufts Center for the Study of Drug Development (Tufts CSDD), a company\’s ability to quickly identify clinical investigators, often among doctors and healthcare influencers, is tightly connected with successful patient recruitment. One study concluded that 1 in 10 investigative sites failed to enrol a single patient in a given clinical trial, and less than 60% met or exceeded their target enrolment levels. Therefore, finding reputable investigators who can source eligible patients to participate is crucial for the success of clinical research as a whole. But how can this process be improved?

Practical Steps of the NLP-Featured Approach to Investigator Recruitment

The healthcare sector has always been of particular interest to data scientists. Many consider it a near-perfect domain to showcase NLP\’s value.  By various estimates, 80% of medical data (i.e., from medical records, imaging devices, sensors, wearables, health documents, and articles) remains unlabelled and untapped after it was created. However, all this unstructured data when sorted, labeled, and cleared has an enormous potential to disrupt clinical research. 

Modern NLP techniques help to process and analyse clinical documentation, extract the required information, and automate much of the work that researchers previously had to do themselves. Some of the techniques that have proved to be especially effective and time-saving are:

  • Named entity recognition identifies patterns, doctors’ names, phones, locations, drug components and other entities and objects that may be of interest. For example, it can locate the most frequently mentioned doctors’ names and the attributes of certain specified parameters. 
  • Semantic parsing produces precise meaning representations from unstructured clinical trial data. Broadly speaking, it converts natural language utterances into logical forms. Applied in practice, it helps to classify investigators and patients and label the relationships between them.
  • Topic modelling helps to conduct topic segmentation and recognition. It allows researchers to automatically define what topics were used and what text segments concern a specific case.
  • Keyword extraction aids with the extraction of essential information from unstructured articles and publications. It saves considerable time for the professionals conducting the trials.
  • Text summarisation is employed to analyse clinical trial data and summarise it according to different abstracts or a particular query.
  • Relationship extraction is a technique that extracts semantic relationships between two or more entities, for example, between article authors, doctors, clinics, diagnosis, medications. Different relationships can be extracted depending on the researcher’s goals.

Following topic modelling and relationship extraction, impact factor algorithms can be utilised to measure the relative importance of authors that have researched specific topics. After analysing links between articles, a numerical weighting to every article in a set of articles, and on a specific topic, can be assigned. In this way, it is possible to measure a publications\’ relevance. Moreover, this technique defines the importance of every scientific article and every doctor who has published an article by measuring the publication\’s quotations from other articles.


When designing clinical trials, these NLP techniques can be used by researchers to screen articles published by investigators and find those authors/investigators with substantial experience in specific disease states. This can be achieved by placing the connections between authors and their relative weight within a specific dataset. For instance, taking into account the correlation between a number of held trials and enrolment rates, it makes sense to filter out and then include the investigators that have prior experience in participating in clinical research within a particular data set.

This article first appeared on and was written by  Igor KruglyakMichael DePalma


For relevant updates on Emergency Services news and events, subscribe to

Scroll to Top