
Sometimes the data you need isn’t in a nice CSV file; it’s stuck on a website. Web Scraping is the process of using code to automatically read and extract that data. Among the tools available, using BeautifulSoup for web scraping is a popular choice for its ease of use and flexibility.
We need two libraries:
requests: To fetch the HTML code of the page (just like your browser does).BeautifulSoup(bs4): The essential library for parsing and scraping content efficiently.
Step 1: Installation
pip install requests beautifulsoup4Step 2: Fetch the Page
Let’s use BeautifulSoup for web scraping a dedicated practice site: http://quotes.toscrape.com. It’s safe and legal to scrape.
import requests
from bs4 import BeautifulSoup
url = "http://quotes.toscrape.com"
response = requests.get(url)
# Check if it worked
if response.status_code == 200:
print("Successfully fetched the page!")
else:
print("Failed to fetch the page.")Step 3: Parse the HTML
Now we feed the page content into BeautifulSoup for effective HTML parsing, illustrating the power of web scraping.
soup = BeautifulSoup(response.text, 'html.parser')
# Let's find the first quote on the page.
# (We know it's in a <span> with class="text" because we inspected the page in our browser first!)
quote = soup.find('span', class_='text')
print(quote.text)
# Output: "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."</span>Step 4: Get All of Them
We can use find_all to get a list of every quote on the page, showcasing its capability.
all_quotes = soup.find_all('span', class_='text')
for q in all_quotes:
print(q.text)
print("---")A Warning on Ethics
Always check a website’s robots.txt file (e.g., google.com/robots.txt) to see if they allow scraping. it should be done considerately, as aggressive scraping can crash small sites. Ethics are crucial in any form of web scraping.




