A web scraping program which opens a google page, search "Python" and gets the links of all google's search result first of 4 pages
and append that link into a new list.
Below are requirement to run the below program:
- Python Selenium package
- Webdrivers (We have used Chrome driver in this program)
Steps for webscraping:
- Import webdriver from selenium
- Create a driver object by giving the path of your webdriver
- Using driver open the webpage using get method
- Find the google search box using find_element method and enter the Keyword you want to search
- Lastly find all the links in the google search result.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("./chromedriver") #path for Chromedriver
google_page = driver.get('https://www.google.com') #Opens a google page in chrome browser
search_box = driver.find_element_by_xpath('//input[@name="q"]') # Find the google search box
search_box.send_keys('Python') #Enter the keyword "Python"
search_box.send_keys(Keys.ENTER) #Clicks the search button
## Getting links from first 4 pages in google search result
counter = 0
for i in range(counter,4):
page_no = driver.find_elements_by_xpath("//table[@class='AaVjTc']/tbody/tr/td/a") # Get the pages links which is at the bottom
page_no[i].click() #Click that page to go that page
value = driver.find_elements_by_xpath('//div[@class="r"]/a') #Gets Links of that page
## Loop for getting the href value of all the links in search result
for each_val in value:
link = each_val.get_attribute('href')
print("Complete links of four pages","\n".join(pages_links))
driver.quit() #Close the browser window.
DevTools listening on ws://127.0.0.1:64359/devtools/browser/9c3dcfc0-5d6c-419c-ac43-2e9998e503e4
Complete links of four pages https://developers.google.com/edu/python