Concept in Python for gathering targeted stock data from a given trade platform. April 2025.

Christopher Clayton

04/26/2025

Because I now research stocks and make order decisions on a regular basis, I thought about creating a web scraper for taking stock data out of a trade platform's GUI interface.

I could not yet get the below to work primarily because of two-factor login considerations for utilizing a non-GUI method, and in needing to figure out how to pass dummy browser identification information. The website I was targeting completely changes its URLs and creates new pop-up boxes for two-factor procedures.

However, this is the strategy and comments I had regarding a method using Python with the BeautifulSoup library. I also commented out a GUI-based method using the webbot library for function streamlining, which is based on the Selenium library, because it requires using either a previous version of Selenium or explicitly disabling a function reference in webbot's library that later versions of Selenium now deprecate.

The remaining problem from that standpoint stemming from the choice between the two that I didn't reconcile on the BeautifulSoup side is how to activate the equivalent of onclick events with that library. This would be necessary to set asynchronous filters on a trade platform before conducting the scrapes and to traverse paginated content. A GUI-based library opens the entire website in an instance of a web driver-enabled web browser and simulates mouse clicks on button elements.

I largely anonymized the targeted example site and its HTML element references given that most sites do not allow web scraping. However, generally the intent of the framework is for logging into such a website, going to its stock listing page which is assumed to be a single page load with asynchronous pagination and separate page loads for each stock symbol, and gathering all targeted stock page links (i.e. setting asynchronous filters and then traversing and collecting all of the stock page links under those filter criteria). Then key data such as current stock price and dividend yield would be collected via traversing the collected page links and adding the data to an array, and then copying the data to a CSV file.

To get familiar with the BeautifulSoup library, I primarily utilized Bright Data's tutorial - https://brightdata.com/blog/how-tos/web-scraping-with-python. The rest largely comes down to analyzing the structure of a desired website target.

Frankly, for my purposes, I find that I can mass-open stock sub-pages (CTRL + right click in FireFox) on the type of stock trading platform that I'm utilizing, then traverse and close tabs out as needed on review while writing notes (CTRL + Tab and CTRL + W to traverse tabs from left to right and to close a tab, respectively, in FireFox). However, for many hundreds or thousands of data points to analyze for mass-scale investment diversification considerations, scraping would definitely need to be considered.

Further, in my strategy for deciding trades to execute, I look at current price versus the absolute high and low over a five-year history. If I were serious about scraping, I'd want to somehow capture those two data points for each stock as well at minimum, which I didn't outline in this example. From there, a local macro in MS Excel or in a similar program could be used to traverse the scraped data set for the points of interest based on desired parameters.

# D:\ScrapeIntlStockData\StockScraperTrade Websitev1.py

# Ultimate, the goal of this code is to log in to Trade Website and then use the following link, after it has been set up for various filters, to sequentially check every possible link representing a unique stock sub-page.

# https://client.Trade Website.com/research/stocks/

# Trade Website's international screener requires looking at each sub-page explicitly to find information such as current dividend yield based on current absolute payout divided by current price, 52-week price range, price changes over other units of time, etc.

# Upon each sub-page load, the point is then to extract this type of data unique to these sub-pages, including the stock symbols, and print them sequentially to an array.

# This should then result in a doubly-nested loop until all links on the main page and sub-pages are looked through for key data.

# Based on Trade Website's formatting, probably want to gather up ALL <td> elements on a particular search results page, iterate through results and only do something useful with the data points that start with https (open them). Or even go a step further and collect all the <a href> elements because the links all look wrapped that way.

# The <tr detail="adr"> tag on each sub-page contains the stock symbol, current price in other sub-tags.

# The <td> tags on the sub-pages list information such as the current dividend percentage, and <tr> tags act as the descriptors.

# All of this sub-page information is ultimately in <div id="outer-section-body"> but it's all buried in such deep layers of different tags that inspecting different frames locationally helps to find it all because FireFox will open it up from root div to leaf divs.

# My problem is now if there's a way to run a script against a page that's already open on a given browser. I don't know what else could be done because opening a web page with Python would still mean needing to log in.

# Not only is BeautifulSoup needed, but also probably pandas or Selenium to handle element.click events which will be needed to tailor a web-page like this for asynchronous filters that have to be set up.

# Further, need a way to log in to a relevant account via non-GUI methods.

# <div id="user-name"> and <div id="password"> with <button id="btnLogin">

# Other problem is if it asks for a two-factor after logging in via script since it would be an unrecognized device.

# On Trade Website's dedicated login page, it's actually <input id="loginIdInput"> or a placeholder of "Login ID" and then <input id="passwordInput"> with a placeholder of "Password"

# Then the same <button id="btnLogin" and placeholder or label of "Log in"

import requests

from bs4 import BeautifulSoup

# import pandas as pd # Not using in favor of webbot for element clicks

#import selenium

from selenium import webdriver

import csv

#import webbot

from webbot import Browser

# Webbot absolutely has to use a previous version of Selenium (before 4.0) and urlib.

# pip install --upgrade urllib3==1.26.16

# pip install selenium==3.141.0

# Chrome Driver also needs to be pathed to where the Chrome browser is.

options = webdriver.ChromeOptions() # Works if wanting to use Selenium/webbot

#options.binary_location = "C:/ProgramData/Microsoft/Windows/Start Menu/Programs" # Not needed

#chrome_driver_binary = "D:/ScrapeIntlStockData/chromedriver.exe" # Not needed

#driver = webdriver.Chrome(chrome_driver_binary, chrome_options=options) # Not needed

#driver = webdriver.Chrome("D:/ScrapeIntlStockData/chromedriver.exe", chrome_options=options) # Not needed

driver = webdriver.Chrome("C:\Program Files\Google\Chrome\Application\chrome.exe", chrome_options=options) # Works if wanting to use Selenium/webbot

# from chromedriver_py import binary_path # This should not necessarily be needed due to webbot importing all of Selenium's driver features within webbot.py.

# svc = webdriver.ChromeService(executable_path=binary_path)

# driver = webdriver.Chrome(service=svc)

# Because even webbot requires manipulation directly in a web browser to get at the JavaScript, I'm abandoning that method but then another way has to be considered in place of 'click' events.

# Not even "type" for simulating typing into a web form can then be used because it's far different in use in base Python.

# Really don't see how scraping via no-GUI can be squared with GUI-based automated typing and clicking, but I don't know how typing and clicking-like actions can be done without GUI.

MozillaHeader = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'

}

web = Browser()

# May need to add conditionals and explicit wait periods to take into account possible page redirects for needing to log in and two-factor authorization, if I can find a way to log in via two-factor within a script since that entails needing an inherently non-script-originating additional input.

# Likely not workable via non-GUI methods because of an instant access denied. Would need to mess around with passing different user agents.

#response = requests.post(

# 'https://www.Trade Website.com/home',

# data={

# "Login ID": "SMK",

# "Password": "######"

# }

#print(response.content)

#web.go_to('https://www.Trade Website.com/client-home')

#web.type('SMK' , into='Login ID', id='LoginIDInput')

#web.type('#####' , into='Password' , id='passwordInput') # specific selection

#web.click('Log in' , id='btnLogin')

# Trade Website will redirect to https:// gateway

# Want to click on "text me"

# web.click(classname='col-3 col-m-full col-s-full authenticator-option' , id='otp_sms')

# Which phone number option to choose on selection window load. The input groups are all the same so the one you want has to be specified by number. For me, it's the first option of two that appears (number one).

# Then send the code. Also, rather odd that once an option is clicked it gets called a dirty class rather than pristine...

# web.click(classname=button ng-untouched ng-pristine ng-valid',number=1)

# web.click('Send Code', id='Continue')

# Yet another page change to the submit code page; https://sws-gateway-nr.Trade Website.com/ui/host/#/otp/code

# Need a Selenium-specific way to handle giving control of the GUI to the user temporarily if the GUI route is taken

# two_fa_code = input("Enter the 2FA code: ")

# web.type(two_fa_code , into='AccessCode' , id='securityCode')

# web.click('Log In', id='continueButton')

# Still a question of whether this will all spark a two-factor authorization situation and how to handle that. Plus, it will likely go to a login page as a redirect and then back to the intended page so I wonder if I should force my own redirect in case the code starts running before the original redirect loads.

# Anyway, next before parsing the page, filters have to be set for the links that need to be harvested.

# Collating all the links into a pseudo-array is done at once so there's no double-nesting issues in opening a link and then needing to go back to the main page all over again, which would necessitate resetting all the filters.

# Trade Website search results are paginated and I'm not going to get into that now, but it seems like it's not a big deal to add more logic to go to the next pagination and do the same activity. Filters should be maintained but it may need to be a doubly-nested loop to get through all possible paginations.

# As far as minimally what to click on, it's all organized by <div class="regionName"> and the regions are in <a> wrappers.

# But, in terms of what should be clickable, it seems like that would be the <li class="region"> tags which subsume the divs and set "regioncode="..." parameters.

# Trade Website region codes: AUS, USA, CA, Africa, JP (in that order) before getting to specific country filters. Clicking on a region adds all those countries to the current filter which starts off empty; zero search results.

# No seeming need for pandas if the webbot library can be used.

# May have to figure out GUI method of data scraping if Beautiful Soup cannot access the Selenium-derived page opening

# If it is all based on GUI where the login info is already set up in Chrome from a prior manual login, may as well go straight to this step instead because it should log in anyway.

web.go_to('https://client.Trade Website.com/research/ ')

# mainPage = requests.get('https://client.Trade Website.com/research/stocks/' , headers=MozillaHeader)

web.click(tag='li' , classname='region' , id='region', number=6)

# Set rating filter criteria and refresh

# The rating buttons themselves are JavaScript scripts in <a href> tags subsuming <span class="displayText"> tags for the actual A, B, C, etc. ratings.

# I have to assume the click event targets an area and thus even if the JavaScript wrapper itself isn't targeted, the text element within it being clicked should still activate the button's functionality.

web.click(tag='span' , classname='displayText' , text='AA')

web.click(tag='span ' , classname='displayText' , text='BB')

web.click(tag='button' , id='viewMatches')

# First scrape to capture all data in 'a' elements and variable set-up to hopefully capture all desired sub-links in an array

writer = csv.writer(csv_file)

subElementsDataCounter = 0

# a_elements = []

# sub_elements = []

# soupMainParse = BeautifulSoup(mainPage.text, 'html.parser')

# a_elements = soupMainParse.find_all('a')

aelements = driver.findElements(By.tagName("a"))

subelements = []

link = 0

# Pagination logic for multi-page situations after an initial page is scraped, borrowing heavilt from BightData.com but also using webbot for some purposes

# next_page_element = soup.find('a', class_='next')

# web.click(tag='a' , classname='next')

# Go to next pagination

nextpageelement = driver.findElement(By.tagName("a"), By.className("next"))

# Continue taking <a> data from pagination until no more paginations exist

while nextpageelement is not None:

web.click(next_page_element)

# This code assumes more data from <a> tags will be added as new element additions (or additions to a "list" in Selenium parlance) and not over-write existing elements through whatever loop is inherent to this function

aelements = aelements + driver.findElements(By.tagName("a"))

# next_page_element = soup.find('a', class_='next')

# web.click(tag='a' , classname='next')

nextpageelement = driver.findElement(By.tagName("a"), By.className("next"))

#next_page_relative_url = next_page_element.find('a', href=True)['href']

# get the new page

#page = requests.get(base_url + next_page_relative_url, headers=MozillaHeader)

# parse the new page

# soup = BeautifulSoup(page.text, 'html.parser')

# Data send to CSV.

# Only traverse all the links once all the pagination and collection of links is complete

link = 0

for link in aelements:

# subPage = requests.get(a_elements[link])

# soupSubParse = BeautifulSoup(subPage.text, 'html.parser')

web.go_to(aelements[link])

# sub_elements[subElementsDataCounter] = soupSubParse.find_all('td') # Not sure what the exact logic should be, but each set of td sub-elements taken from a particular sub-page needs to be added as one element added to a sub_elements array and so making it the same counter progression as the sub-link loader makes sense.

subelements[subElementsDataCounter] = driver.findElements(By.tagName("td"))

subElementsDataCounter = subElementsDataCounter + 1

for data in sub_elements:

#sub_elements[data].append(stock_data.text)

writer.writerow(sub_elements[data].values())

csv_file.close()

Back to menu (top)

Concept in Python for gathering targeted stock data from a given trade platform. April 2025.