警告:python和编程新手
目标:从该页面上删除所有作业链接,并将其放入txt / csv / json / XML文件:https://www.indeed.ca/jobs?q=title%3Aengineer&l=Vancouver%2C+BC
代码:
from selenium import webdriver
import csv
browser = webdriver.Firefox()
browser.get('https://www.indeed.ca/jobs?q=engineer&l=Vancouver%2C+BC&sort=date')
jobs = browser.find_elements_by_partial_link_text('Engineer')
for job in jobs:
print(job.get_attribute("href"))
with open("output.csv",'w') as resultFile:
wr = csv.writer(resultFile)
wr.writerow(jobs)
打印结果时效果很好,但是在csv文件中不存储任何内容。另外,我计划将其抓取到一页以上,那么以扩展链接而不覆盖链接的方式修改csv文件的最佳方法是什么?
答案 0 :(得分:1)
由于jobs
中的输入wr.writerow(jobs)
无效,因此未写入csv,可以这样做
with open("output.csv",'w') as resultFile:
wr = csv.writer(resultFile)
wr.writerow([j.get_attribute("href") for j in jobs])
答案 1 :(得分:0)
这看起来很奇怪for jobs in jobs:
。您确定不是要写for job in jobs:
吗?那可能是您的问题。您踩着jobs
迭代。
看看这个例子:
>>> numbers = [1,2,3,4]
>>> numbers
[1, 2, 3, 4]
>>> type(numbers)
<type 'list'>
>>> for numbers in numbers:
... print numbers
...
1
2
3
4
>>> numbers
4
>>> type(numbers)
<type 'int'>
不是print numbers
将numbers
变成了int
。观察:
>>> numbers = [1,2,3,4]
>>> type(numbers)
<class 'list'>
>>> for numbers in numbers:
... print(":)")
...
:)
:)
:)
:)
>>> type(numbers)
<class 'int'>
>>> numbers
4