我要做的是从CSV文件中获取pandas数据帧。这就是我的文件的样子。
Date;Open;High;Low;Last Close;Chg.%;Total Value;Total Volume;
"02/10/2017";"29.345";"29.375";"29.005";"29.105";"-0.33%";"32,283,437";"1,106,900";
"02/13/2017";"29.100";"30.050";"29.100";"29.870";"+2.63%";"51,101,636";"1,715,810";
"02/14/2017";"29.710";"30.150";"29.665";"30.100";"+0.77%";"36,702,427";"1,225,914";
"02/15/2017";"30.190";"30.300";"29.865";"29.950";"-0.50%";"42,224,148";"1,406,422";
"02/16/2017";"29.815";"29.940";"29.585";"29.770";"-0.60%";"37,021,299";"1,245,004";
这是我想要实现的格式化。
Open High Low Last Close Total Volume
Date
2012-05-18 42.050 45.0000 38.0000 38.2318 573576400.0
2012-05-21 36.530 36.6600 33.0000 34.0300 168192700.0
2012-05-22 32.610 33.5900 30.9400 31.0000 101786600.0
2012-05-23 31.370 32.5000 31.3600 32.0000 73600000.0
我认为做这样的事情就足够了。
df = pd.read_csv("/home/tomek/Pobrane/historicalData_AT0000652011.csv")
df = df[['Date','Open', 'High', 'Last Low', 'Close', 'Total Volume]]
然而,我收到错误
"['Date;' 'Open;' 'High;' 'Low;' 'Last Close;' 'Total Volume;'] not in index"
df.columns
允许我只重命名一列,否则它表示不存在这样的索引,所以我认为标题被视为一个大列。
所以我认为我应该格式化我的CSV文件。但是,我不确定以什么方式,所以它对大熊猫来说是可读的。
感谢您的任何建议
答案 0 :(得分:2)
我认为只需要参数:
就需要read_csv
sep=';'
用于分隔符index_col=['Date']
和parse_dates=['Date']
DatetimeIndex
usecols
按列表thousands=','
,用于删除,
import pandas as pd
temp=u"""Date;Open;High;Low;Last Close;Chg.%;Total Value;Total Volume;
"02/10/2017";"29.345";"29.375";"29.005";"29.105";"-0.33%";"32,283,437";"1,106,900";
"02/13/2017";"29.100";"30.050";"29.100";"29.870";"+2.63%";"51,101,636";"1,715,810";
"02/14/2017";"29.710";"30.150";"29.665";"30.100";"+0.77%";"36,702,427";"1,225,914";
"02/15/2017";"30.190";"30.300";"29.865";"29.950";"-0.50%";"42,224,148";"1,406,422";
"02/16/2017";"29.815";"29.940";"29.585";"29.770";"-0.60%";"37,021,299";"1,245,004";"""
#after testing replace 'pd.compat.StringIO(temp)' to '/home/tomek/Pobrane/historicalData_AT0000652011.csv'
df = pd.read_csv(pd.compat.StringIO(temp),
sep=";",
index_col=['Date'],
parse_dates=['Date'],
usecols=['Date','Open', 'High', 'Low', 'Last Close', 'Total Volume'],
thousands=',')
print (df)
Open High Low Last Close Total Volume
Date
2017-02-10 29.345 29.375 29.005 29.105 1106900
2017-02-13 29.100 30.050 29.100 29.870 1715810
2017-02-14 29.710 30.150 29.665 30.100 1225914
2017-02-15 30.190 30.300 29.865 29.950 1406422
2017-02-16 29.815 29.940 29.585 29.770 1245004
答案 1 :(得分:0)
如果它不是逗号分隔值文件,则必须显式传递分隔符(&#34 ;;")。请尝试以下方法:
index_col
当然你也可以直接通过"日期"作为index_col通过参数In [1]: pd.read_clipboard(sep=';', index_col="Date", thousands=',')[['Open', 'High', 'Low', 'Last Close', 'Total Volume']]
Out[1]:
Open High Low Last Close Total Volume
Date
02/10/2017 29.345 29.375 29.005 29.105 1106900
02/13/2017 29.100 30.050 29.100 29.870 1715810
02/14/2017 29.710 30.150 29.665 30.100 1225914
02/15/2017 30.190 30.300 29.865 29.950 1406422
02/16/2017 29.815 29.940 29.585 29.770 1245004
:
pcl::visualization::PointPickingCallback