我的数据如下:
TEST
2012-05-01 00:00:00.203 OFF 0
2012-05-01 00:00:11.203 OFF 0
2012-05-01 00:00:22.203 ON 1
2012-05-01 00:00:33.203 ON 1
2012-05-01 00:00:44.203 OFF 0
TEST
2012-05-02 00:00:00.203 OFF 0
2012-05-02 00:00:11.203 OFF 0
2012-05-02 00:00:22.203 OFF 0
2012-05-02 00:00:33.203 ON 1
2012-05-02 00:00:44.203 ON 1
2012-05-02 00:00:55.203 OFF 0
最终,我希望能够下采样这样的数据到个别日子,例如,使用,mean,min,max -values。 我无法让它为我的数据工作并得到此错误:
TypeError: unhashable type: 'list'
可能它与数据框中的日期格式有关,因为索引行如下所示:
[datetime.datetime(2012, 5, 1, 0, 0, 0, 203000)] OFF 0
任何人都可以提供帮助。 到目前为止我的代码是:
import time
import dateutil.parser
from pandas import *
from pandas.core.datetools import *
t0 = time.clock()
filename = "testdata.dat"
index = []
data = []
with open(filename) as f:
for line in f:
if not line.startswith('TEST'):
line_content = line.split(' ')
mydatetime = dateutil.parser.parse(line_content[0] + " " + line_content[1])
del line_content[0] # delete the date
del line_content[0] # delete the time so that only values remain
index_row = [mydatetime]
data_row = []
for item in line_content:
data_row.append(item)
index.append(index_row)
data.append(data_row)
df = DataFrame(data, index = index)
print df.head()
print df.tail()
print
date_from = index[0] # first datetime entry in data frame
print date_from
date_to = index[len(index)-1] #last datetime entry in date frame
print date_to
print date_to[0] - date_from[0]
dayly= DateRange(date_from[0], date_to[0], offset=datetools.DateOffset())
print dayly
grouped = df.groupby(dayly.asof)
#print grouped.mean()
#df2 = df.groupby(daily.asof).agg({'2':np_mean})
time2 = time.clock() - t0
print time2
答案 0 :(得分:0)
我对pandas
没有任何经验,但我从您的代码中可以看出来,
df = DataFrame(data, index = index)
和错误,似乎index
不应该像python列表那样是一个可变对象。也许这会奏效:
df = DataFrame(data, index = tuple(index))
您的index_row
& data_row
列出了自己和&您要将它们附加到index
& data
列出。
答案 1 :(得分:0)
您最好将所有日期时间插值保留为pandas
,并使用干净的输入流进行输入。然后,您可以使用read_fwf
(对于固定宽度格式的行)分隔字段。例如:
import pandas
import StringIO
buf = StringIO.StringIO()
buf.write(''.join(line
for line in open('f.txt')
if not line.startswith('TEST')))
buf.seek(0)
df = pandas.read_fwf(buf, [(0, 24), (24, 27), (27, 30)],
index_col=0, names=['switch', 'value'])
print df
输出:
switch value
2012-05-01 00:00:00.203 OFF 0
2012-05-01 00:00:11.203 OFF 0
2012-05-01 00:00:22.203 ON 1
2012-05-01 00:00:33.203 ON 1
2012-05-01 00:00:44.203 OFF 0
2012-05-02 00:00:00.203 OFF 0
2012-05-02 00:00:11.203 OFF 0
2012-05-02 00:00:22.203 OFF 0
2012-05-02 00:00:33.203 ON 1
2012-05-02 00:00:44.203 ON 1
2012-05-02 00:00:55.203 OFF 0