我想从网站this table抓取一份股票数据表 在我的代码中,我生成了一个股票符号数组。网站 finviz 的网址会为每个特定股票生成包含最后一部分网址(ei。https://finviz.com/quote.ashx?t=MBOT和MBOT)的表格。我想输入我生成的数组作为URL的最终输入(例如,如果我的数组是[AAPL,MBOT],那么https://finviz.com/quote.ashx?t=AAPL然后https://finviz.com/quote.ashx?t=MBOT)从每个URL抓取输出表并输入已删除的数据信息到CSV文件(在本例中标题为'output.csv')这是我的代码:
import csv
import urllib.request
from bs4 import BeautifulSoup
twiturl = "https://twitter.com/ACInvestorBlog"
twitpage = urllib.request.urlopen(twiturl)
soup = BeautifulSoup(twitpage,"html.parser")
print(soup.title.text)
tweets = [i.text for i in soup.select('a.twitter-cashtag.pretty-link.js-nav b')]
print(tweets)
url_base = "https://finviz.com/quote.ashx?t="
url_list = [url_base + tckr for tckr in tweets]
fpage = urllib.request.urlopen(url_list)
fsoup = BeautifulSoup(fpage, 'html.parser')
with open('output.csv', 'wt') as file:
writer = csv.writer(file)
# write header row
writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))
# write body row
writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))
这是我的错误列表
"C:\Users\Taylor .DESKTOP-0SBM378\venv\helloworld\Scripts\python.exe" "C:/Users/Taylor .DESKTOP-0SBM378/PycharmProjects/helloworld/helloworld"
Antonio Costa (@ACInvestorBlog) | Twitter
Traceback (most recent call last):
['LINU', 'FOSL', 'LINU', 'PETZ', 'NETE', 'DCIX', 'DCIX', 'KDMN', 'KDMN', 'LINU', 'CNET', 'AMD', 'CNET', 'AMD', 'NETE', 'NETE', 'AAPL', 'PETZ', 'CNET', 'PETZ', 'PETZ', 'MNGA', 'KDMN', 'CNET', 'ITUS', 'CNET']
File "C:/Users/Taylor .DESKTOP-0SBM378/PycharmProjects/helloworld/helloworld", line 17, in <module>
fpage = urllib.request.urlopen(url_list)
File "C:\Users\Taylor .DESKTOP-0SBM378\AppData\Local\Programs\Python\Python36-32\Lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Taylor .DESKTOP-0SBM378\AppData\Local\Programs\Python\Python36-32\Lib\urllib\request.py", line 517, in open
req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
Process finished with exit code 1
答案 0 :(得分:1)
您正在将列表传递给urllib.request.urlopen()而不是字符串,这就是全部!所以你已经非常接近了。
要打开所有不同的网址,只需使用for循环。
for url in url_list:
fpage = urllib.request.urlopen(url)
fsoup = BeautifulSoup(fpage, 'html.parser')
#scrape single page and add data to list
with open('output.csv', 'wt') as file:
writer = csv.writer(file)
#write datalist
答案 1 :(得分:0)
您正在将列表传递给urlopen方法。尝试以下操作,它将从第一个URL中检索数据。
fpage = urllib.request.urlopen(url_list[0])
fsoup = BeautifulSoup(fpage, 'html.parser')