在数据框

时间:2015-05-29 04:12:57

标签: python numpy pandas

我有一个文本文件,其中我有几行我想读作pandas dataframe。以下是我从文本文件中复制并保存到另一个文本文件中的几行

MTU, Time, Power, Cost, Voltage
MTU1,05/11/2015 19:59:06,4.102,0.62,122.4
MTU1,05/11/2015 19:59:05,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.089,0.62,122.3
MTU1,05/11/2015 19:59:06,4.089,0.62,122.3
MTU1,05/11/2015 19:59:04,4.097,0.62,122.4
MTU1,05/11/2015 19:59:03,4.097,0.62,122.4
MTU1,05/11/2015 19:59:02,4.111,0.62,122.5
MTU1,05/11/2015 19:59:03,4.111,0.62,122.5
MTU1,05/11/2015 19:59:02,4.104,0.62,122.5
MTU1,05/11/2015 19:59:01,4.090,0.62,122.4
MTU1,05/11/2015 19:59:00,4.093,0.62,122.4
MTU1,05/11/2015 19:58:59,4.112,0.62,122.5
MTU1,05/11/2015 19:58:58,4.107,0.62,122.6
MTU1,05/11/2015 19:58:57,4.092,0.62,122.7

现在,当我使用以下内容读入文本文件时。

energy=pd.read_csv("energy.txt",sep=",")
# Reading in first 5 rows of data. 
energy.head()
Out[65]:

我明白了:

MTU Time    Power   Cost    Voltage
0   MTU1    05/11/15 19:59  4.102   0.62    122.4
1   MTU1    05/11/15 19:59  4.089   0.62    122.3
2   MTU1    05/11/15 19:59  4.089   0.62    122.3
3   MTU1    05/11/15 19:59  4.089   0.62    122.3
4   MTU1    05/11/15 19:59  4.097   0.62    122.4

问题是我猜列仍然是字符串形式。我使用以下内容将它们转换为数字。

energy=energy.convert_objects(convert_numeric=True)

但是当我尝试用时间绘制功率变量以及时间来看趋势时,我得到了 错误

energy.plot(energy.time,energy.power)

         if isinstance(obj, tuple) and is_setter:
   1142                         return {'key': obj}
-> 1143                     raise KeyError('%s not in index' % objarr[mask])
   1144 
   1145                 return _values_from_object(indexer)

KeyError: '[ 4.102  4.089  4.089  4.089  4.097  4.097  4.111  4.111  4.104  4.09\n  4.093  4.112  4.107  4.092  4.092  4.109  4.107  4.107  4.092  4.092\n  4.092  4.107  4.109  4.094  4.09   4.103  4.103  4.103  4.11   4.096\n  4.122  4.156  4.154  4.154  4.144  4.15   4.16   4.16   4.163  4.163\n  4.154  4.15   4.157  4.167  4.16   4.149  4.153  4.165  4.166  4.155\n  4.151  4.164  4.172  4.161  4.152  4.16   

我想是因为功率变量仍然在某些值上附加了“\ n”。我该如何纠正这个错误。

1 个答案:

答案 0 :(得分:1)

我对熊猫0.16这看起来似乎对我很好。列名在名称的开头有一个空格,但是 -

In [48]: energy
Out[48]: 
     MTU                 Time   Power   Cost   Voltage
0   MTU1  05/11/2015 19:59:06   4.102   0.62     122.4
1   MTU1  05/11/2015 19:59:05   4.089   0.62     122.3
2   MTU1  05/11/2015 19:59:04   4.089   0.62     122.3
3   MTU1  05/11/2015 19:59:06   4.089   0.62     122.3
4   MTU1  05/11/2015 19:59:04   4.097   0.62     122.4
5   MTU1  05/11/2015 19:59:03   4.097   0.62     122.4
6   MTU1  05/11/2015 19:59:02   4.111   0.62     122.5
7   MTU1  05/11/2015 19:59:03   4.111   0.62     122.5
8   MTU1  05/11/2015 19:59:02   4.104   0.62     122.5
9   MTU1  05/11/2015 19:59:01   4.090   0.62     122.4
10  MTU1  05/11/2015 19:59:00   4.093   0.62     122.4
11  MTU1  05/11/2015 19:58:59   4.112   0.62     122.5
12  MTU1  05/11/2015 19:58:58   4.107   0.62     122.6
13  MTU1  05/11/2015 19:58:57   4.092   0.62     122.7

In [49]: energy.columns
Out[49]: Index([u'MTU', u' Time', u' Power', u' Cost', u' Voltage'], dtype='object')

In [50]: energy.plot(x=' Time', y=' Power') # or energy.plot(' Time', ' Voltage')
Out[50]: <matplotlib.axes.AxesSubplot at 0x10847ffd0>

xTimeyPower的情节为:

Here's the plot with <code>x</code> as <code>Time</code> and <code>y</code> as ` Power!