有人可以解释为什么groupby-apply应用于类似的数据帧会产生不同的结果吗?
pred2的'p1'列被转换为float并且正在丢失相关信息。
import pandas as pd
def predictions(tool):
out = pd.Series(index=['p1', 'p2', 'useTime'], dtype=object)
if 'step1' in list(tool.State):
out['p1'] = str(tool[tool.State == 'step1'].Machine.values[0])
if 'step2' in list(tool.State):
out['p2'] = str(tool[tool.State == 'step2'].Machine.values[0])
out['useTime'] = str(tool[tool.State == 'step2'].oTime.values[0])
return out
df1 = pd.DataFrame({'Key': ['B', 'B', 'A', 'A'],
'State': ['step1', 'step2', 'step1', 'step2'],
'oTime': ['', '2016-09-19 05:24:33', '', '2016-09-19 23:59:04'],
'Machine': ['23', '36L', '36R', '36R']})
df2 = df1.copy()
df2.oTime = pd.to_datetime(df2.oTime)
pred1 = df1.groupby('Key').apply(predictions)
pred2 = df2.groupby('Key').apply(predictions)
print(pred1)
print(pred2)
输出如下:
p1 p2 useTime
Key
A 36R 36R 2016-09-19 23:59:04
B 23 36L 2016-09-19 05:24:33
p1 p2 useTime
Key
A NaN 36R 2016-09-19T23:59:04.000000000
B 23.0 36L 2016-09-19T05:24:33.000000000
请注意p1列中的差异,即使df1和df2几乎相同,只是第三列被转换为timeStamp。