熊猫ValueError:只能比较标记相同的Series对象

时间:2020-01-18 15:39:12

标签: pandas dataframe compare valueerror

df = pd.DataFrame([{'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:34.160', 'Milliseconds':35974160 , 'RecordType':'ENTER', 'Value':36597.95},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:34.566', 'Milliseconds':35974566 , 'RecordType':'DELETE', 'Value':175.70},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:37.832', 'Milliseconds':35977832 , 'RecordType':'DELETE', 'Value':1093470.00},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:37.841', 'Milliseconds':35977841 , 'RecordType':'DELETE', 'Value':25799.34},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:38.846', 'Milliseconds':35978846 , 'RecordType':'ENTER', 'Value':2460.15},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:45.015', 'Milliseconds':35985015 , 'RecordType':'DELETE', 'Value':6731.00},
                    {'Instrument':'BHP', 'Date':'2012-04-18', 'Time':'09:59:47.024', 'Milliseconds':35987024 , 'RecordType':'OPEN', 'Value':np.nan}])```

我有上面的DataFrame。我的目标是从OPEN到OPEN之前的10秒内使用RecordType DELETE获得值的总和。我尝试了以下代码:

opening_time = df[df.RecordType=='OPEN']
ten_seconds_before_open = opening_time['Milliseconds'] - 10*1000
delete_type = df[df.RecordType=='DELETE']
sum_delete = delete_type[delete_type.Milliseconds >= ten_seconds_before_open].Value.sum()
print(sum_delete)

但是,它返回ValueError: Can only compare identically-labeled Series objects。我可以知道什么是最好的解决方案吗?

实际上,我实际上有数百万行,其中包含许多Instrument和Date。我试图编写代码以获取每个日期每个工具的DELETE值的总和,

def sum_delete_type(df):
     opening_time = df[df.RecordType=='OPEN']
     ten_seconds_before_open = opening_time['Milliseconds'] - 10*1000
     delete_type = df[df.RecordType=='DELETE']
     sum_delete = delete_type[delete_type.Milliseconds >= ten_seconds_before_open].Value.sum()
     return sum_delete

df.groupby(['Instrument', 'Date']).apply(sum_delete_type)

但是没有用。请帮忙。谢谢。

1 个答案:

答案 0 :(得分:0)

这个怎么样

opening_time = df[df.RecordType=='OPEN']
ten_seconds_before_open = opening_time['Milliseconds'] - 10*1000
delete_type = df[df.RecordType=='DELETE']
y=[]
for x in ten_seconds_before_open:
    y.extend(delete_type[delete_type.Milliseconds >= x].index.tolist())
y=list(set(y))
delete_type[delete_type.index.isin(y)]['Milliseconds'].sum()