我如何在以下方面削减基金的价格:
http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=JAS_U
这是错误的,但我该如何修改它:
import pandas as pd
import requests
import re
url = 'http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=JAS_U'
tables = pd.read_html(requests.get(url).text, attrs={"class":re.compile("fundPriceCell\d+")})
答案 0 :(得分:2)
我喜欢lxml来解析和查询HTML。这就是我想出的:
import requests
from lxml import etree
url = 'http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=JAS_U'
doc = requests.get(url)
tree = etree.HTML(doc.content)
row_xpath = '//tr[contains(td[1]/@class, "fundPriceCell")]'
rows = tree.xpath(row_xpath)
for row in rows:
(date_string, v1, v2) = (td.text for td in row.getchildren())
print "%s - %s - %s" % (date_string, v1, v2)
答案 1 :(得分:1)
我的解决方案与您的解决方案类似:
import pandas as pd
import requests
from lxml import etree
url = "http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=JAS_U"
r = requests.get(url)
html = etree.HTML(r.content)
data = html.xpath('//table//table//table//table//td[@class="fundPriceCell1" or @class="fundPriceCell2"]//text()')
if len(data) % 3 == 0:
df = pd.DataFrame([data[i:i+3] for i in range(0, len(data), 3)], columns = ['date', 'bid', 'ask'])
df = df.set_index('date')
df.index = pd.to_datetime(df.index, format = '%d/%m/%Y')
df.sort_index(inplace = True)