我正在尝试将我的all_data标记转换为一种格式,我可以使用布尔值对它们进行比较。我认为它涉及使用float和/或int运算符。但是,一旦网站报废,我就会对输出产生一些担忧。输出为整数,小数和百分比。我正在谈论修改的具体行是第33行。我尝试使用int()和.int。我没有在Stackoverflow上发现有关此内容或Beautiful Soup文档的任何问题。
from BeautifulSoup import BeautifulSoup
import csv
import re
import urllib
import urllib2
from urllib2 import HTTPError
# import modules
symbolfile = open("symbols.txt")
symbolslist = symbolfile.read()
newsymbolslist = symbolslist.split("\n")
i = 0
f = csv.writer(open("pe_ratio.csv","wb"))
# short cut to write
f.writerow(["Name","PE","Revenue % Quarterly","ROA% YOY","Operating Cashflow","Debt to Equity"])
#first write row statement
# define name_company as the following
while i<len(newsymbolslist):
try:
page = urllib2.urlopen("http://finance.yahoo.com/q/ks?s="+newsymbolslist[i] +"%20Key%20Statistics").read()
except urllib2.HTTPError:
continue
soup = BeautifulSoup(page)
name_company = soup.findAll("div", {"class" : "title"})
for name in name_company: #add multiple iterations?
all_data = soup.findAll('td', "yfnc_tabledata1")
stock_name = name.find('h2').string #find company's name in name_company with h2 tag
try:
f.writerow([stock_name, all_data[2].getText(),all_data[17].getText(),all_data[13].getText(), all_data[29].getText(),all_data[26].getText()]) #write down PE data
except (IndexError, HTTPError) as e:
pass
i+=1
这是CSV文件中的输出。
Agilent Technologies Inc. (A) 25.7 -2.80% 5.60% N/A 51.03
请注意,您可以通过将其放在symbols.txt文件中垂直加载股票代码符号。
答案 0 :(得分:1)
如果您想对数据进行比较(即季度百分比大于25),您必须格式化文本,以便将其转换为数字
quarterly_percent = all_data[17].getText()
if quarterly_percent != "N/A":
#cut off the percent sign and conver to a "python number"
quarterly_percent = float(quarterly_percent[:-1])
if quarterly_percent > 25:
print "its a good one"
答案 1 :(得分:1)
要将all_data字符串值转换为数字,请尝试以下操作:
all_data = soup.findAll('td', "yfnc_tabledata1")
stock_name = name.find('h2').string #find company's name in name_company with h2 tag
clean_data = list()
for x in [data.GetText().strip(' %') for data in all_data]
try:
clean_data.append(float(x))
except ValueError:
clean_data.append(x)
try:
f.writerow([stock_name, clean_data[2], clean_data[17], clean_data[13], clean_data[29], clean_data[26]]) #write down PE data
except (IndexError, HTTPError) as e:
pass