我需要遍历数据帧,并检查特定列的奇数行是否等于给定变量(偶数行相同)。
这是我的代码:
mydf = pd.read_excel(test.xlsx, header=0, index= False)
mydf = mydf.sort_values(by='Time')
if ((mydf['Door Name'].iloc[::2]=='RDC_OUT-1') & (mydf['Door Name'].iloc[1::2]=='RDC_IN-1')):
for i in range (l):
mydf['diff'] = mydf['Times'].iloc[1::2].to_numpy() - mydf['Times'].iloc[::2]
Total = mydf['diff'].sum()
print('Total: ',Total)
但是当我运行它时,出现此错误:
if ((mydf['Door Name'].iloc[::2]=='RDC_OUT-1') & (mydf['Door Name'].iloc[1::2]=='RDC_IN-1')):
File "C:\Users\khoul\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这是我的数据框:
Door name Time Last Name First Name
RDC_IN-1 05/08/2019 15:23:00 aa bb
RDC_OUT-1 05/08/2019 12:39:00 aa bb
RDC_IN-1 05/08/2019 12:13:00 aa bb
RDC_OUT-1 05/08/2019 09:10:00 aa bb
我不知道为什么它不接受!
答案 0 :(得分:1)
我的想法是,首先用Time
用RDC_OUT-1
选择所有已排序的行,然后选择下一个RDC_IN-1
的值,然后再过滤掉另一行:
mydf = df.sort_values(by='Time')
m1 = mydf['Door name'] == 'RDC_OUT-1'
m2 = mydf['Door name'] == 'RDC_IN-1'
m11 = m1 & m2.shift(-1)
m22 = m1.shift() & m2
df = mydf[m11 | m22]
print (mydf)
Door name Time Last Name First Name
3 RDC_OUT-1 2019-05-08 09:10:00 aa bb
2 RDC_IN-1 2019-05-08 12:13:00 aa bb
1 RDC_OUT-1 2019-05-08 12:39:00 aa bb
0 RDC_IN-1 2019-05-08 15:23:00 aa bb
因此,因为从注释中获得相同数量的IN
和OUT
行解决方案应该很好:
Total = (df.loc[df['Door name']=='RDC_IN-1','Time'] -
df.loc[df['Door name']=='RDC_OUT-1','Time'].to_numpy()).sum()
print('Total: ',Total)
Total: 0 days 05:47:00