
The pipeline() function in our Hugging Face intro is amazing, but it’s a black box. To do advanced work (like fine-tuning or getting raw data), you need to use the two core components manually:
- The Tokenizer: Turns human text into numbers (tokens) the model understands.
- The Model: The actual AI “brain” that does the math.
Step 1: The Tokenizer
A tokenizer breaks “Hello, world!” into ['Hello', ',', 'world', '!'] and then converts those pieces into numbers.
from transformers import AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
text = "I love Python Pro Hub!"
tokens = tokenizer(text, return_tensors="pt") # "pt" = PyTorch Tensors
print(tokens)
# Output:
# {'input_ids': tensor([[ 101, 1045, 2293, 10...]]),
# 'attention_mask': tensor([[1, 1, 1, 1, ...]])}Step 2: The Model
Now we load the actual model brain.
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(model_name)Step 3: Put Them Together
We feed the numeric tokens from the tokenizer into the model.
# Unpack the tokens dictionary into the model
outputs = model(**tokens)
print(outputs.logits)
# Output: tensor([[-2.8943, 3.0475]], grad_fn=<AddmmBackward0>)This raw output (logits) is what the pipeline uses internally. The higher number is the model’s prediction. This gives you full control to fine-tune and build custom AI applications.





