我正在尝试从网站上读取数据,而且我在某种程度上是新的。看一些例子,但不知何故没有让它运行。该网站是:
http://www.ariva.de/adidas-aktie/historische_kurse
有一个下载按钮,可以下载csv文件,在附图中的红色框中标记为右下角:
不知道为什么我收到NaN值?代码定义如下:
import pandas as pd
import io
import requests
url="http://www.ariva.de/A1EWWW/historische_kurse?boerse_id=6&month=2006-01-31¤cy=&clean_split=1&clean_split=0&clean_payout=1&clean_payout=0&clean_bezug=1&clean_bezug=0/wkn_A1EWWW_historic.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')), error_bad_lines=False)
print(c)
答案 0 :(得分:4)
您可以使用pandas.read_html()方法:
In [261]: df = pd.read_html(url, thousands='.', decimal=',')[3]
In [262]: df
Out[262]:
0 1 2 3 4 5 6 7
0 Datum Erster Hoch Tief Schluss NaN Stücke Volumen
1 310106 37.1782 37.5537 36.6383 36.7215 € 2324069 85,3 M
2 300106 36.3204 37.3745 36.2798 37.191 € 2553488 95,0 M
3 270106 35.6077 36.3887 35.58 36.2414 € 2950272 107 M
4 260106 35.2877 35.548 35.0594 35.4605 € 2147777 76,2 M
5 250106 35.6077 35.6077 35.1255 35.3133 € 1985601 70,1 M
6 240106 35.5266 35.612 35.2813 35.42 € 1435138 50,8 M
7 230106 35.2279 35.6931 35.0145 35.4584 € 1506623 53,4 M
8 200106 35.516 35.516 35.2514 35.3879 € 2251534 79,7 M
9 190106 35.0999 35.58 35.0999 35.4157 € 1425647 50,5 M
10 180106 34.8695 35.2343 34.5707 35.0871 € 2812569 98,7 M
11 170106 35.0145 35.565 35.0145 35.3623 € 2431866 86,0 M
12 160106 35.149 35.4584 34.9783 35.3751 € 747868 26,5 M
13 130106 35.5245 35.5266 35.0295 35.0786 € 2016092 70,7 M
14 120106 35.1383 35.5608 35.0145 35.452 € 941786 33,4 M
15 110106 35.3133 35.4882 34.8396 34.9527 € 1341719 46,9 M
16 100106 35.102 35.3346 35.0359 35.1127 € 1673729 58,8 M
17 090106 35.7976 35.7976 35.0359 35.3005 € 2055502 72,6 M
18 060106 35.7507 35.8467 35.5266 35.6909 € 1532681 54,7 M
19 050106 35.8254 35.8787 35.4989 35.74 € 1653103 59,1 M
20 040106 35.74 35.9534 35.5693 35.8467 € 2760820 99,0 M
21 030106 35.2386 35.8168 35.1618 35.3303 € 2885207 102 M
22 020106 34.4598 35.1084 34.4598 35.0359 € 1254853 44,0 M
In [263]: df.columns = df.iloc[0]
In [264]: df.drop(0, inplace=True)
In [265]: df
Out[265]:
0 Datum Erster Hoch Tief Schluss NaN Stücke Volumen
1 310106 37.1782 37.5537 36.6383 36.7215 € 2324069 85,3 M
2 300106 36.3204 37.3745 36.2798 37.191 € 2553488 95,0 M
3 270106 35.6077 36.3887 35.58 36.2414 € 2950272 107 M
4 260106 35.2877 35.548 35.0594 35.4605 € 2147777 76,2 M
5 250106 35.6077 35.6077 35.1255 35.3133 € 1985601 70,1 M
6 240106 35.5266 35.612 35.2813 35.42 € 1435138 50,8 M
7 230106 35.2279 35.6931 35.0145 35.4584 € 1506623 53,4 M
8 200106 35.516 35.516 35.2514 35.3879 € 2251534 79,7 M
9 190106 35.0999 35.58 35.0999 35.4157 € 1425647 50,5 M
10 180106 34.8695 35.2343 34.5707 35.0871 € 2812569 98,7 M
11 170106 35.0145 35.565 35.0145 35.3623 € 2431866 86,0 M
12 160106 35.149 35.4584 34.9783 35.3751 € 747868 26,5 M
13 130106 35.5245 35.5266 35.0295 35.0786 € 2016092 70,7 M
14 120106 35.1383 35.5608 35.0145 35.452 € 941786 33,4 M
15 110106 35.3133 35.4882 34.8396 34.9527 € 1341719 46,9 M
16 100106 35.102 35.3346 35.0359 35.1127 € 1673729 58,8 M
17 090106 35.7976 35.7976 35.0359 35.3005 € 2055502 72,6 M
18 060106 35.7507 35.8467 35.5266 35.6909 € 1532681 54,7 M
19 050106 35.8254 35.8787 35.4989 35.74 € 1653103 59,1 M
20 040106 35.74 35.9534 35.5693 35.8467 € 2760820 99,0 M
21 030106 35.2386 35.8168 35.1618 35.3303 € 2885207 102 M
22 020106 34.4598 35.1084 34.4598 35.0359 € 1254853 44,0 M
答案 1 :(得分:0)
要通过请求下载,首先我们需要找到download
按钮命中的URL(可能通过JS?),您可以使用浏览器检查器或等效的方法来执行此操作。我发现这是你的情况。
import requests
r = requests.get("http://www.ariva.de/quote/historic/historic.csv?secu=291&boerse_id=6&clean_split=1&clean_payout=0&clean_bezug=1&min_time=8.2.2016&max_time=8.2.2017&trenner=%3B&go=Download", stream=True)
with open('out.csv', 'wb') as fd:
for chunk in r.iter_content(100):
fd.write(chunk)