|

AI Project: Named Entity Recognition (NER) with Hugging Face

3D isometric illustration of a robot highlighting text into 3D blocks labeled Person, Location, and Org, representing Hugging Face NER Named Entity Recognition.

This is a core task in Natural Language Processing (NLP). When it comes to extracting entities from text, Hugging Face NER has become a popular and effective tool. Named Entity Recognition (NER) is the process of finding and classifying “entities” in textโ€”like people’s names, company names, or locations.

This is how search engines understand that “Apple” in “Apple is releasing a new phone” is an Organization, not a Fruit.

Step 1: Installation

pip install transformers torch

Step 2: The Code

We use the token-classification pipeline. It’s called this because it classifies each word (or token).

from transformers import pipeline

# 1. Load the pipeline
# We'll use a popular, powerful NER model
ner_pipeline = pipeline(
    "token-classification",
    model="dslim/bert-base-NER",
    grouped_entities=True # This is a helper to combine "Guido", "van", "Rossum"
)

# 2. Define your text
text = "My name is Alice, I live in New York, and I work for Google."

# 3. Run the NER model!
results = ner_pipeline(text)

# 4. Print the results
print("--- Entities Found ---")
for entity in results:
    print(f"Text: {entity['word']}")
    print(f"Type: {entity['entity_group']} ({entity['score']:.4f})")
    print("-----")

Step 3: The Result

The model will return a list of all the entities it found and their types:

--- Entities Found ---
Text: Alice
Type: PER (0.9982)
-----
Text: New York
Type: LOC (0.9990)
-----
Text: Google
Type: ORG (0.9987)
-----

The model correctly identified Alice as a PERson, New York as a LOCation, and Google as an ORGanization.


Key Takeaways

  • Named Entity Recognition (NER) finds and classifies entities like names and locations in text.
  • Search engines distinguish entities, like interpreting ‘Apple’ as an Organization not a Fruit.
  • The Hugging Face NER uses a token-classification pipeline to classify each word.
  • The model provides a list of identified entities and their types, like Alice as a PERson and New York as a LOCation.

Similar Posts

Leave a Reply