grandcanyon
Coder
I am trying to grab one url from the log files in headless chrome. The problem I am having is sometimes I get just the url and other times I get additional characters either before or after the url thus the url doesn't work. I don't know why it works sometime and other times it doesn't. Is the browser_log variable getting data values added to it while my regex is parsing the url?
Code:
import re, json
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.set_capability("goog:loggingPrefs", {'performance': 'ALL'})
service = Service(executable_path="/home/alarm/project_pychrome/chromedriver")
driver = webdriver.Chrome(service=service, options=options)
driver.get(url)
browser_log = driver.get_log('performance')
regex = '(?=gin\",\"url\":\")*?https:\/\/.*?m3u8?.*?(?=\"},\"requestId)'
url_hls = re.findall(regex, str(browser_log), re.DOTALL)
link = url_hls[0]