Python/Selenium example

My daughter asked me to create a process that would log on to her high school grade website, collect her current grades and calculate the current grade-point average. I used this as an opportunity to become familiar with Selenium. You can find more information about Selenium here:

http://www.seleniumhq.org/

I coded this in Python and the below blog posting contains the relevant parts of the script with explanation. I was running this on Kali Linux as root.

Assuming that you already have Python installed on your system, you can get Selenium as follows:

install selenium

After you have installed Selenium, you have to install the correct driver for the browser that you intend to use. For this example, I was using the Firefox browser which needs the geckodriver. Additional information on this can be found at:
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver

The geckodriver location has to be added to the PATH variable. As an example:

export PATH=$PATH:/path/to/geckodriver

At the top of the program, after the usual imports for Python, add in the Selenium commands:

#!/usr/bin/python
#
import sys
import time
import string

from decimal import *

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

This routine will set up the username and password and then open the browser:

def read_web_page():
    user_name = 'user name'
    pass_word = 'password'
        
    #
    ## Open the browser
    #
    driver = webdriver.Firefox()

Navigate to the web page with:

    driver.get("https:website name.jsp?status=login")

Confirm that you at the correct web page with:

    assert "Campus Parent Portal Login" in driver.title
    temp = driver.current_url
    print 'Now on page ' + temp

Now that we are on the login page, find the elements username and password and supply the information from the variables created previously:

    elem = driver.find_element_by_name("username")
    elem.send_keys(user_name)

    elem = driver.find_element_by_name("password")
    elem.send_keys(pass_word)

Click on the sign in button and wait for 30 seconds for the website to respond:

    driver.find_element_by_css_selector("input[src*=btn_sign_in]").click()
    time.sleep(30)

Confirm that we are at the grades web page:

    #
    ## click on grades
    #
    temp = driver.current_url
    print 'Now on page ' + temp

This web page had multiple frames. The following command switches to the frame detail page, finds the element grades and clicks on it. After the click is complete wait for 20 seconds for the website to respond:

    driver.switch_to_frame("frameDetail")
    driver.find_element_by_link_text("Grades").click()
    time.sleep(20)

After the website has responded (i.e. the grades are being displayed), grab the contents of the web page (i.e. page source) and convert all characters to “printable characters”:

    #
    ## get the grades
    #
    temp = driver.page_source
    printable = set(string.printable)
    temp2 = filter(lambda x: x in printable, temp)

Write the page source to a file as text:

    grades_file = open('/dean/python/grades_file_raw.txt','w')
    grades_file.write(temp2)
    grades_file.close()

Sign out of the web site:

    #
    ## Sign out
    #
    #driver.switch_to_frame("frameDetail")
    driver.find_element_by_link_text("Sign Out").click()
    time.sleep(10)

This particular website had an alert window that pops up with the question “Do you really want to log off?”. The below command switch to the alert window and accept the alert indicating consent to log off:

    alert = driver.switch_to_alert()
    alert.accept()
    time.sleep(10)
    driver.quit()

All of the above processing involves the selenium driver. Now that the information is available as text in a file, I was able to parse it with the regular Python commands. Some of these are shown below as examples:

    
    grades_file_raw = open('/dean/python/grades_file_raw.txt','r')
    grades_file_fin = open('/dean/python/grades_file_fin.txt','w')
    prev_teacher = ""
    for raw_line in grades_file_raw:
        printable = set(string.printable)
        temp = filter(lambda x: x in printable, raw_line)
        raw_line = temp

        if  'div id="studentName"' in raw_line: 
             student = cut_str(2, '>', '<', raw_line) 
             out_line = student grades_file_fin.write('%-60s\n' %(out_line)) 
             out_line = '-' * len(student) grades_file_fin.write('%-60s\n' %(out_line)) 

Additional code removed for brevity.

Author: Dean Capps

Database consultant at Amazon Web Services.