我试图使用selenium和python打印一些信息,但是它只打印一个不包含CSS路径的信息,这是while循环的方式。
pageIndex = 1
while True: # Keep looping through all pages
# Navigate to the search page
browser.get("https://www.houz.com/page_num="+ str(pageIndex))
time.sleep(6)
links = browser.find_elements_by_css_selector('div > h3 > a')
for link in links:
urls = link.text
jobs = browser.find_elements_by_css_selector('div > div.description')
for title in jobs:
jobtitles = title.text
with open("1Exportdata.csv", "a") as csvFile:
csvFile.write(url + "," + jobtitle + "\n")
pageIndex += 1
if pageIndex == 5010:
browser.close()
答案 0 :(得分:2)
因为您正在使用:
for title in jobs:
jobtitles = title.text
在第一个循环中,jobtitles
是第一个title.text
,但是在第二个循环中,它成为第二个title.text
。最后,它将成为最后一个title.text
。
例如:
>>> for i in [1, 2, 3]:
... num = i
>>> print(num)
3
>>>
所以你需要在with open("1Exportdata.csv", "a") as csvFile:
循环内写for
。由于您有两个列表,我建议您使用zip
类拉链:
pageIndex = 1
while True: # Keep looping through all pages
# Navigate to the search page
browser.get("https://www.houz.com/page_num="+ str(pageIndex))
time.sleep(6)
links = browser.find_elements_by_css_selector('div > h3 > a')
jobs = browser.find_elements_by_css_selector('div > div.description')
for link, title in zip(links, jobs):
url = link.text
jobtitle = title.text
with open("1Exportdata.csv", "a") as csvFile:
csvFile.write(url + "," + jobtitle + "\n")
pageIndex += 1
if pageIndex == 5010:
browser.close()
另外我认为使用while
循环是没用的,请尝试使用for
循环:
for pageIndex in range(1, 5011):
# Navigate to the search page
browser.get("https://www.houz.com/page_num="+ str(pageIndex))
time.sleep(6)
links = browser.find_elements_by_css_selector('div > h3 > a')
jobs = browser.find_elements_by_css_selector('div > div.description')
for link, title in zip(links, jobs):
url = link.text
jobtitle = title.text
with open("1Exportdata.csv", "a") as csvFile:
csvFile.write(url + "," + jobtitle + "\n")