如何从列表和csv文件创建[dict]

时间:2018-02-26 08:56:30

标签: python web-scraping beautifulsoup

我想创建一个字典,这样我就可以从列表中返回一个列表中的股票的值(在本例中为Rel Volume),在这种情况下,从列表"推文'列表'推文'已经从twitter中删除了,而Rel卷来自一个csv文件,其中的内容已从FinViz.com中删除。这是我的代码:

以下是打印出来的内容:

import csv
import urllib.request
from bs4 import BeautifulSoup

write_header = True

twiturl = "https://twitter.com/ACInvestorBlog"
twitpage = urllib.request.urlopen(twiturl)
soup = BeautifulSoup(twitpage,"html.parser")

print(soup.title.text)

tweets = [i.text for i in soup.select('a.twitter-cashtag.pretty-link.js-nav b')]
print(tweets)

url_base = "https://finviz.com/quote.ashx?t="
url_list = [url_base + tckr for tckr in tweets]

with open('_Stocks.csv', 'w', newline='') as file:
    writer = csv.writer(file)

    for url in url_list:
        try:
            fpage = urllib.request.urlopen(url)
            fsoup = BeautifulSoup(fpage, 'html.parser')

            # write header row (once)
            if write_header:
                writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))
                write_header = False

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))
        except urllib.error.HTTPError:
            print("{} - not found".format(url))

with open('_Stocks.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    for line in csv_reader:
        print(line['Rel Volume'])

所以,我想创建一个字典,其中包含' AKS'等于' 0.64'

2 个答案:

答案 0 :(得分:0)

Dict = {tweet:line['Rel Volume'] for (tweet, line) in zip(tweets, csv_reader)}

给出输出:

{'AKS': '0.78',
 'CRMD': '7.49',
 'EKSO': '0.57',
 'FORD': '0.25',
 'KDMN': '0.43',
 'LEDS': '7.49',
 'RNN': '0.64',
 'SPX': '0.81',
 'SPY': '0.68',
 'TSLA': '0.78',
 'UVXY': '1.08',
 'VXX': '0.86',
 'X': '0.64'}

答案 1 :(得分:0)

您可以将csv文件的第一列设为tckr,而不是创建字典。然后,在写入文件时,在第一列中写入每个tckr;在阅读时,打印tckr和值。

此外,最好使用set而不是列表来保存tckrs,因为有或多次重复结果。

为此,首先您需要对代码进行一些更改。而不是事先创建网址列表,而是格式化循环内的网址。像:

for tckr in tweets:
    URL = URL_BASE + tckr

这有助于保存tckr值。

完整代码:

write_header = True

twiturl = "https://twitter.com/ACInvestorBlog"
twitpage = urllib.request.urlopen(twiturl)
soup = BeautifulSoup(twitpage, "html.parser")

# use a set instead of a list to save the tckrs
tweets = {i.text for i in soup.select('a.twitter-cashtag.pretty-link.js-nav b')}

URL_BASE = "https://finviz.com/quote.ashx?t="

with open('_Stocks.csv', 'w', newline='') as file:
    writer = csv.writer(file)

    # note the change
    for tckr in tweets:
        URL = URL_BASE + tckr
        try:
            fpage = urllib.request.urlopen(URL)
            fsoup = BeautifulSoup(fpage, 'html.parser')

            if write_header:
                # note the change
                writer.writerow(['tckr'] + list(map(lambda e: e.text, fsoup.find_all('td', {'class': 'snapshot-td2-cp'}))))
                write_header = False

            # note the change
            writer.writerow([tckr] + list(map(lambda e: e.text, fsoup.find_all('td', {'class': 'snapshot-td2'}))))
        except urllib.request.HTTPError:
            print("{} - not found".format(URL))

with open('_Stocks.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    for line in csv_reader:
        print(line['tckr'], line['Rel Volume'])

<强>输出:

https://finviz.com/quote.ashx?t=SPX - not found
TSLA 1.02
CRMD 7.49
EKSO 0.39
AKS 0.64
X 0.78
FORD 0.43
TVIX 1.08
SPY 0.81
VXX 0.68
RNN 0.57
LEDS 0.25
UVXY 0.86
KDMN 1.07

注意 writerow功能参数的更改。

如果你仍然想要字典中的值,你可以使用它:

with open('_Stocks.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    my_dict = {line['tckr']: line['Rel Volume'] for line in csv_reader}
    print(my_dict)

输出:

{'AKS': '0.64', 'X': '0.78', 'TSLA': '1.02', 'RNN': '0.57', 'EKSO': '0.39', 'LEDS': '0.25', 'FORD': '0.43', 'KDMN': '1.07', 'CRMD': '7.49', 'SPY': '0.81', 'VXX': '0.68', 'UVXY': '0.86', 'TVIX': '1.08'}