Beware! This post is more than 3 years old, it may be outdated or incorrect! Please check elsewhere for accurate information!
Scraping websites is getting difficult, with the popularization of services like CloudFlare and Incapsula. In this post I will show you how to obtain a huge number of free proxies and use them in PHP. This method actually works and is not in breach of any terms of service.
This is based on the SurfEasy "VPN" proxy solution offered by Opera and it's parent "Golden Brick equity fund". Opera bought the Canadian company "SurfEasy" in 2015 and has recently started offering it's users a free, in-browser VPN. It's not actually VPN but a simple HTTP proxy that tunnels CONNECT requests, similar to some I have made in the past.
In this post I will show you how to use their service in your PHP scripts, for scraping websites and similar.
First of all, you're going to need a Ubuntu 17.04 server with a public IPv4 address.
The reason we need Ubuntu 17.04 is that it comes by default with curl 7.52.1, which we will use to connect with the proxy. Earlier versions of curl do not support HTTPS proxies with SNI TLS certificates. You will also need Python 3 for the oprahProxy.py script, with requests module.
Download the following file to the server (you can use the command below:)
wget "https://milankragujevic.com/uploads/oprahProxy.py.txt" -O oprahProxy.py
Now, run it with the output redirected to oprah.txt:
python oprahProxy.py >oprah.txt 2>&1
When it's done running, open the oprah.txt file with
cat oprah.txt and you will find the following lines that are important: (Make sure to backup the file, because if you don't you will loose access to your proxies!)
2017-04-17 03:44:02,909 INFO Pick a proxy from the list above and use these credentials: 2017-04-17 03:44:02,910 INFO Username: [40 character username, redacted] 2017-04-17 03:44:02,910 INFO Password: [64 character password, redacted]
The important lines are the one containing our Username and Password. Make sure they're after the text "Pick a proxy..." because the script prints out two sets of credentials.
Download my PHP script from here, for example with
wget "https://milankragujevic.com/uploads/oprah.php.txt" -O "oprah.php"
Open the script in an editor, like
nano for example:
Modify the script so that the credentials in it match your obtained username and password. The key lines are 45 to 48.
Enter the URL and the credentials and choose a proxy from the table by it's ID.
Run the script in the browser and it will print out the page, as seen by the proxy.
This post was last updated on April 16th, 2017.