Back to main.

Calmcode Shorts

gliner.py logogliner.py

Finding named entities in text is a common task in natural language processing. The most common way to do this is by using a model that has been trained on a dataset but this approach assumes that you have labelled data already.

In the long run, it seems best to ensure a large labelled dataset for these kinds of tasks. But when you are just getting started, it might be easier to use a zero-shot approach. This approach is bound to make plenty of mistakes, but this can be enough to help you get started by helping you annotate some data manually.

This is where the gliner project can really help out.

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")

text = """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, 
has recently published a paper in the prestigious journal "Nature Neuroscience". 
His research focuses on a rare genetic mutation, found in less than 0.01% of 
the population, that appears to prevent the development of Alzheimer's disease. 
Collaborating with researchers at the University of California, San Francisco, 
the team is now working to understand the mechanism by which this mutation confers 
its protective effect. Funded by the National Institutes of Health, their research 
could potentially open new avenues for Alzheimer's treatment."""

labels = ["person", "date"]
entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

This code will output the following:

Dr. Paul Hammond => person
Johns Hopkins University => medical institute
Alzheimer's disease => disease
University of California, San Francisco => medical institute
National Institutes of Health => medical institute

As you use this library you will notices that it does make plenty of mistakes. But it can be a good starting point to help you annotate some data manually.

If you're eager to explore, you may also want to check all the different pre-trained models for this library by checking out the repositorys on huggingface.


Back to main.