
This is one of the most powerful concepts in modern AI. In particular, Hugging Face Text Embeddings are an innovative way to use embeddings today. An “Embedding” is a way to turn a sentence into a list of numbers (a “vector”) that represents its meaning.
Why is this useful? You can compare two vectors to see how semantically similar two sentences are. This is the magic behind “semantic search,” RAG, and finding “related documents.”
Step 1: Install
We need a special library, sentence-transformers, which is built on top of Hugging Face.
pip install sentence-transformers
Step 2: The Code
We will load a pre-trained model and use it to “encode” sentences.
from sentence_transformers import SentenceTransformer, util
import torch # We'll use torch to calculate similarity
# 1. Load a pre-trained model
# 'all-MiniLM-L6-v2' is a popular, fast, and good model
model = SentenceTransformer('all-MiniLM-L6-v2')
# 2. Sentences to encode
sentences = [
"A man is eating a piece of bread.",
"A person is consuming food.",
"The cat is playing with a ball.",
"A programmer is writing Python code."
]
# 3. Generate the embeddings!
embeddings = model.encode(sentences)
print(f"Shape of one embedding: {embeddings[0].shape}")
# Output: (384,) -> Each sentence is now a 384-dimension vectorStep 3: Compare Similarity
Now, let’s see which sentences are “closest” in meaning. We’ll compare our first sentence to all the others.
# Convert embeddings to PyTorch tensors for similarity calculation
emb_tensors = torch.tensor(embeddings)
# 4. Calculate "Cosine Similarity" between sentence 0 and all others
# This returns a matrix of scores (0.0 to 1.0)
cosine_scores = util.cos_sim(emb_tensors[0], emb_tensors)
print("\nSimilarity of 'A man is eating a piece of bread.' to:")
print(f"- '{sentences[1]}': {cosine_scores[0][1]:.4f}")
print(f"- '{sentences[2]}': {cosine_scores[0][2]:.4f}")
print(f"- '{sentences[3]}': {cosine_scores[0][3]:.4f}")Output:
Similarity of 'A man is eating a piece of bread.' to: - 'A person is consuming food.': 0.7554 - 'The cat is playing with a ball.': 0.0763 - 'A programmer is writing Python code.': -0.0121
The AI correctly knows that “eating bread” is very similar to “consuming food,” but not at all similar to “a cat playing” or “writing code.”
Key Takeaways
- An ‘Embedding’ converts a sentence into a vector, representing its meaning.
- This technique helps compare vectors to determine the semantic similarity of sentences.
- To use this concept, first install the ‘sentence-transformers’ library, built on Hugging Face.
- Next, load a pre-trained model to encode sentences and compare their meanings.
- The AI identifies that ‘eating bread’ is similar to ‘consuming food’, but not to ‘a cat playing’ or ‘writing code’.





