请参阅以下链接。 http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0
从该网站我需要提取数据并将其转换为垂直和水平。我的代码是
from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
import csv
from pandas import read_csv
import requests
file_path=r'C:\Users\PreciseT3\Desktop\EicherStockDetails.csv'
stock_ratio_filepath=r'C:\Users\PreciseT3\Desktop\facevalues.csv'
url = 'http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
eicher_stock_url='http://www.moneycontrol.com/stocks/company_info/print_financials.php?sc_did=EM&type=cons_keyfinratio'
html = urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
main = []
for tr in soup.findAll('tr'):
mainSub = []
for td in tr.findAll('td'):
mainSub += [td.text]
main += [mainSub]
splitter = []
for y in range(len(main)):
splitter += [any('--' in x for x in main[y])]
split_index = [x for x in range(len(splitter)) if splitter[x] == True]
main_split = main[(split_index[3]+2):(split_index[8]-2)]
main_zip=list(zip(*main_split))
DF = pd.DataFrame(main_zip,columns=[x.replace(' ', '_') for x in main_zip.pop(0)])
with open(stock_ratio_filepath,'r+') as file:
writer=csv.writer(file)
writer.writerow(DF)
for row_values in main_zip:
writer.writerow(row_values)
我输出的csv文件格式如下:
EicherStockDetails.csv
Revenue,Other_Income,Total_Income,Expenditure,Interest,PBDT,Depreciation,PBT,Tax,Net_Profit,Equity,EPS,CEPS,OPM_%,NPM_%
"6,188.03",178.24,"6,366.27","-4,457.55",-1.41,"1,907.31",-137.73,"1,769.58",-539.73,"1,229.85",27.16,453.20,503.53,30.85,19.87
"3,031.22",116.30,"3,147.52","-2,297.66",-1.67,848.19,-50.16,798.03,-239.11,558.92,27.10,206.38,224.75,28.04,18.44
"1,702.47",80.10,"1,782.57","-1,388.74",-0.27,393.56,-30.41,363.15,-84.53,278.62,27.04,103.15,114.29,23.13,16.37
"1,049.26",45.78,"1,095.04",-903.83,-0.26,190.95,-17.15,173.80,-29.04,144.76,27.00,53.62,59.97,18.22,13.80
670.95,75.89,746.84,-589.97,-2.02,154.85,-13.02,141.83,-17.28,124.55,26.99,46.18,50.97,23.38,18.56
我的要求是:
在给定的链接中,我只需要从开头少量提取列,从结尾提取少量列(随机
过去三天我一直在这里工作。请帮助我摆脱这个。
请给我一些想法来学习。谢谢。
答案 0 :(得分:0)
你可以尝试pandas来读取html表数据
import pandas as pd
df=pd.pandas.read_html(io =path)
print df.to_csv(path_or_buf ='outfile.csv')
df.transpose()
参考
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_html.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html