Automating website scraping

For many years I have had a personal quest to develop a simple way to automate the collection of information from websites. In 2017, I played around with a Python/Selenium-based solution. See blog post Python/Selenium example for details.

While the Python/Selenium-based solution was interesting it did not meet all my requirements such as ease of use, low development time, and flexibility. While browsing the web I ran across Windows Power Automate which looked interesting. I was pleasantly surprised to see that a desktop version of the software was included in my Microsoft 365 subscription. I downloaded and installed the product and attempted to automate the collection of information.

The use case was to get the account balance from my brokerage account, the current DOW and S&P numbers, and write the information to a CSV file with the current date. The steps are as follows:

01. Click on create a new flow (aka program unit)
02. Set up variables for user names and passwords by clicking on “Variables” on the left and dragging the function “Set variable” into the body of the flow
03. Launch a new web browser instance with “Launch new name-of-browser”. I was not able to get the Firefox plugins to work, however, Microsoft Edge worked well
04. Populate the user name and password with “Populate text field on web page”. The product displays a UI when you are on the web page so you simply have to highlight the field on the web page and assign the variable set in 02. above
05. Usually you have to click on a login button of some sort. This is accomplished with the “Press button on web page” function
06. I coded a “Wait” for 30 seconds to let the web page complete the log on
07. Use the “Extract data from web page” to scrape the required data and store it into a variable
08. The “Get current date and time” function retrieves the current date into a variable
09. Write the data into a file with the “Write text to file” function
10. Close the web browser with the “Close web browser” function

Repeat the above steps for each web page you need to extract the information from. The individual functions can be moved up or down in the flow and can be copied to repeat a function.

One issue I ran into was that the is written to a file in columnar format. I am sure that I will be able to change this as I get more familiar with the product. In the interim, I created a few lines of code in Python and executed it from the flow with the “Run DOS command function” to format the data into a CSV file. Now when I want the information from these three websites, I run the flow and open the CSV file in Excel.

The tool also has options to debug such as run from a step or stop at a step. The product is intuitive and easy to learn for anyone with a few basic coding skills.

Complete documentation is here.

Author: Dean Capps

Database consultant at Amazon Web Services. View all posts by Dean Capps

Share this:

Related

Author: Dean Capps