Each event on my work diary has its own details page that I need to be able to scrape. My script successfully navigates the work diary and extracts the detail page URLs, then writes them to a csv file. This csv file is then exploded to an array, but when I try to navigate to the detail page URLS, a 404 error message is thrown by MsChed.
Example url: https://portals.securitas.uk.com/Employ ... d=46998903
I think this could be a character issue with the URL string, but can't figure out how to fix it. I already tried using double quotes around the variable which contains the URL in ChromeNavigate, but that did not help. Any ideas?
Code: Select all
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Log In
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
ChromeStart>session_id
WindowAction>1,data:*
ChromeNavigate>session_id,url,https://portals.securitas.uk.com/Employee/Account/Login?ReturnUrl=%2FEmployee%2F
Let>EmailSelector={"#Email"}
ChromeFindElements>session_id,css selector,EmailSelector,EmailField
ChromeSetElementValue>session_id,EmailField_1,%Email%
Let>PasswordSelector={"#Password"}
ChromeFindElements>session_id,css selector,PasswordSelector,PasswordField
ChromeSetElementValue>session_id,PasswordField_1,%Password%
Let>LoginBtnSelector={"#btnLogin"}
ChromeFindElements>session_id,css selector,LoginBtnSelector,LoginBtn
ChromeElementAction>session_id,LoginBtn_1,click
Label>WaitLogin
Let>DiarySelector={"body > div.container > main > div.PortalCenterContainer > div:nth-child(2)"}
ChromeFindElements>session_id,css selector,DiarySelector,DiaryBtn
If>DiaryBtn_count<1
Wait>1
Goto>WaitLogin
Endif
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Navigate to Diary
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Let>DiarySelector={"body > div.container > main > div.PortalCenterContainer > div:nth-child(2)"}
ChromeFindElements>session_id,css selector,DiarySelector,DiaryBtn
ChromeElementAction>session_id,DiaryBtn_1,click
Label>WaitDiaryLoad
Let>NextWeekArrowSelector={"#DiaryDateForm > div > div > span.icon-arrow-right2"}
ChromeFindElements>session_id,css selector,NextWeekArrowSelector,NextWeekArrow
If>NextWeekArrow_count<1
Wait>1
Goto>WaitDiaryLoad
Endif
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Delete Temporary Files
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
IfFileExists>%BMP_DIR%\event-urls.csv
DeleteFile>%BMP_DIR%\event-urls.csv
Endif
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Extract Event URLs for the Next 4 Weeks to Array
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Let>week=1
Label>ExtractWeekEvents
ChromeGetInfo>session_id,source,theSource
// Extract Events
Let>pattern=(?<=DutyConfirmed" onclick="location.href=').+?(?=')
RegEx>pattern,%theSource%,0,event_array,event_matches,0
Let>k=0
While>k<event_matches
Let>k=k+1
Let>url=https://portals.securitas.uk.com
Let>eventurl=event_array_%k%
Concat>url,eventurl
WriteLn>%BMP_DIR%\event-urls.csv,WrLnRes,%url%
EndWhile
If>week<4
Let>NextWeekArrowSelector={"#DiaryDateForm > div > div > span.icon-arrow-right2"}
ChromeFindElements>session_id,css selector,NextWeekArrowSelector,NextWeekArrow
ChromeElementAction>session_id,NextWeekArrow_1,click
Let>week=week+1
Goto>ExtractWeekEvents
Endif
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Explode Event URL csv file to Array
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
CSVFileToArray>%BMP_DIR%\event-urls.csv,url_array
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Extract Event Details & Write to CSV
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Let>k=1
While>k<url_array_count
Let>jobdetails=url_array_%k%_0
//Throws 404 Error Despite Valid URL
ChromeNavigate>session_id,url,"%jobdetails%"
Let>k=k+1
EndWhile
/*
(Google Galendar CSV Headers, First 2 Required, Rest Optional)
Subject
The name of the event, required.
Example: Final exam
Start Date
The first day of the event, required.
Example: 05/30/2020
Start Time
The time the event begins.
Example: 10:00 AM
End Date
The last day of the event.
Example: 05/30/2020
End Time
The time the event ends.
Example: 1:00 PM
All Day Event
Whether the event is an all-day event. Enter True if it's an all-day event, and False if it isn't.
Example: False
Description
Description or notes about the event.
Example: 50 multiple choice questions and two essay questions
Location
The location for the event.
Example: "Columbia, Schermerhorn 614"
Private
Whether the event should be marked private. Enter True if the event is private, and False if it isn't.
Example: True
*/