使用硒将Web抓取到txt

时间:2020-08-17 13:32:43

标签: python selenium xpath web-scraping

我会从此页面https://www.flashscore.co.uk/football/russia/premier-league/results/抓取ID 然后将 g_1_替换为https://www.flashscore.com/match/ ,并将这些网址导入txt文件。

我使用了这段代码

matches=WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))

for match in matches:
    g1 = matches.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)

但是我遇到了这个错误

AttributeError: 'list' object has no attribute 'replace'

id that i want to scrape

2 个答案:

答案 0 :(得分:2)

此错误消息...

AttributeError: 'list' object has no attribute 'replace'

...表示您在程序中已调用列表上的replace()方法,其中replace()方法替换了指定短语加上另一个指定的词组。

您需要对列表中每个元素的文本调用replace()方法。


解决方案

代替收集元素,您可以从元素中收集文本/短语并创建 list 。实际上,您的代码块将是:

match_texts = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))]
for match_text in match_texts:
    g1 = match_text.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)

答案 1 :(得分:1)

首先,如注释中所述,.replace()是要应用于字符串的方法。您有matches,它是(WebElements的)列表对象,它引发错误'list' object has no attribute 'replace''。您需要遍历用for match in matches:定义的WebElements列表,然后为了使用.get_attribute()方法,请用replace()捕获id属性作为字符串。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


#Initializing the webdriver
options = webdriver.ChromeOptions()

#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')

#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe', options=options)
driver.maximize_window()

url = 'https://www.flashscore.co.uk/football/russia/premier-league/results/'
driver.get(url)
matches=WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'g_1_')]")))

for match in matches:
    g1 = match.get_attribute('id')
    g1 = g1.replace("g_1_", "https://www.flashscore.com/match/")
    print(g1)
    
driver.close()

您还可以将其组合成一个单行纸

g1 = match.get_attribute('id').replace("g_1_", "https://www.flashscore.com/match/")

输出:

https://www.flashscore.com/match/hWhb9Uyh
https://www.flashscore.com/match/rLoB6SLA
https://www.flashscore.com/match/zer38lib
https://www.flashscore.com/match/Eos77864
https://www.flashscore.com/match/4zzK46jN
https://www.flashscore.com/match/tdkfAAMo
https://www.flashscore.com/match/MBpF5nyH
https://www.flashscore.com/match/IwvO3Q5T
https://www.flashscore.com/match/nysS6yGg
https://www.flashscore.com/match/f1pz5Fp6
https://www.flashscore.com/match/jTwq3gFI
https://www.flashscore.com/match/QLhJ8cos
https://www.flashscore.com/match/0voW5eVa
https://www.flashscore.com/match/Yiqv4ZaC
https://www.flashscore.com/match/4CiN7H0m
https://www.flashscore.com/match/Sh1CoRqo