Python Amazon Web Scraper

Amazon price web scraping with python, requests and bs4. Ask Question Asked 3 years ago. Active 8 months ago. Viewed 5k times 2. I have a question about web. Amazon price web scraping with python, requests and bs4. Ask Question Asked 3 years ago. Active 8 months ago. Viewed 5k times 2. I have a question about web scraping the price of an amazon article. I tried to get the price of an article and it works unfortunately not always. I get randomly the status code 503 (server unavailable).

  1. Python Amazon Web Scraper Download
  2. Python Best Web Scraper
  3. Create A Web Scraper Python


Programmatic and scalable web scraping is hard to do. There's a lot of build and maintenance involved that has nothing to do with the actual scraping task.

Serverless computing makes it quite a lot easier.

All you need to worry about is scraping the website in a friendly manner.

Python Amazon Web Scraper Download

Let's jump into creating a serverless web scraper with Python and hosting it on AWS Lambda by using Chalice to do all the heavy lifting for us.

We are going to create a small scraper that returns today's #1 product on ProductHunt. (BTW: this is 100% doable using the ProductHunt API)

Here is the repo for the project:

1. Set up Dev environment 🛠

Set up your python virtual environment and install chalice

mkivrtualenv serverless_scraping

pip install chalice

Make sure to install this dependency via pip, we will be using this to make our scraping life easier:

pip install requests_html

2. Set up Chalice with AWS 🔑

You need to authenticate your machine with your AWS account. If you've done this before, move on to the next step.

First, you need access keys, to get these: Go to the Security Credentials page from the dropdown in the top right corner

Expand the 'Access Keys' dropdown and click 'Create New Access Key'


Make sure to note these down somewhere as you won't be able to see these again.

Now you need to save these keys into a AWS config file:

Create the AWS config folder

mkdir ~/.aws

Create and open a new file

nano ~/.aws/config

Paste this in, making sure to replace the keys and region with your own keys and region

3. Create Scraping Script 🕸

Create the chalice project

chalice new-project producthunt-scraper

Inside your chalice project there will be an file, replace the file with the following code:

Let's break down the code a little.

All serverless functions in chalice are regular python functions, they need an @app decorator applied to them for the function to be called. In this example, we are using @app.route as we want our function to be called when it's requested via HTTP but you could do @app.schedule to run the function on a schedule or the other methods which you can find out about here:

The main part of our function uses the requests_html package to do all the heavy lifting for parsing the HTML document and pulling out the elements based on class names and HTML tags.

Finally, we either return an object which contains the top product or an error.

4. Deploy 🚀

You can test this scraper out locally via the chalice local command, this will create a local server reachable via http://localhost:8000 that you can test your endpoints with.

Once you're ready to go, use this command: chalice deploy

Chalice will now take care of everything, including creating the AWS lambda function in the console and packaging up all the dependencies for AWS lambda to use.

5. Done 🎉

The deploy command should have spat out a URL, this is the public reachable URL for your serverless function and for our ProductHunt scraper.

You've just created your first serverless scraper!

Python Best Web Scraper

More Reading 📖

Check out the Chalice docs here -

Create A Web Scraper Python

Check out the AWS Lambda docs here -

Comments are closed.