将其他html文件列添加到相同的csv文件中

时间:2017-01-03 14:20:18

标签: python csv

请参阅以下链接。 http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0

从该网站我需要提取数据并将其转换为垂直和水平。我的代码是

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
import csv
from pandas import read_csv
import requests

file_path=r'C:\Users\PreciseT3\Desktop\EicherStockDetails.csv'
stock_ratio_filepath=r'C:\Users\PreciseT3\Desktop\facevalues.csv'
url = 'http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
eicher_stock_url='http://www.moneycontrol.com/stocks/company_info/print_financials.php?sc_did=EM&type=cons_keyfinratio'
html = urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")


main = []
for tr in soup.findAll('tr'):
    mainSub = []
    for td in tr.findAll('td'):
        mainSub += [td.text]       
    main += [mainSub]   

splitter = []
for y in range(len(main)):    
    splitter += [any('--' in x for x in main[y])]        


split_index = [x for x in range(len(splitter)) if splitter[x] == True]
main_split = main[(split_index[3]+2):(split_index[8]-2)]


main_zip=list(zip(*main_split))
DF = pd.DataFrame(main_zip,columns=[x.replace(' ', '_') for x in main_zip.pop(0)])
with open(stock_ratio_filepath,'r+') as file:
    writer=csv.writer(file)
    writer.writerow(DF)
    for row_values in main_zip:
        writer.writerow(row_values)

我输出的csv文件格式如下:

EicherStockDetails.csv

Revenue,Other_Income,Total_Income,Expenditure,Interest,PBDT,Depreciation,PBT,Tax,Net_Profit,Equity,EPS,CEPS,OPM_%,NPM_%

"6,188.03",178.24,"6,366.27","-4,457.55",-1.41,"1,907.31",-137.73,"1,769.58",-539.73,"1,229.85",27.16,453.20,503.53,30.85,19.87

"3,031.22",116.30,"3,147.52","-2,297.66",-1.67,848.19,-50.16,798.03,-239.11,558.92,27.10,206.38,224.75,28.04,18.44

"1,702.47",80.10,"1,782.57","-1,388.74",-0.27,393.56,-30.41,363.15,-84.53,278.62,27.04,103.15,114.29,23.13,16.37

"1,049.26",45.78,"1,095.04",-903.83,-0.26,190.95,-17.15,173.80,-29.04,144.76,27.00,53.62,59.97,18.22,13.80

670.95,75.89,746.84,-589.97,-2.02,154.85,-13.02,141.83,-17.28,124.55,26.99,46.18,50.97,23.38,18.56

我的要求是:

  • 是这种在csv文件中读写的好方法
  • 我还需要一些额外的列(不按顺序排序)should be extracted from here添加到相同的给定(见上文)(EicherStockDetails.csv)csv文件
  • 在给定的链接中,我只需要从开头少量提取列,从结尾提取少量列(随机

    过去三天我一直在这里工作。请帮助我摆脱这个。

请给我一些想法来学习。谢谢。

1 个答案:

答案 0 :(得分:0)

你可以尝试pandas来读取html表数据

import pandas as pd
df=pd.pandas.read_html(io =path)
print df.to_csv(path_or_buf ='outfile.csv')
df.transpose()

参考
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_html.html http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html