我有一个pandas df
,其中包含各种功能和时间戳。我试图有效地返回不同功能之间的差异。
这是df
的非常小的示例。 Col C
表示功能,B
显示时间戳,D
显示不同的位置,E
显示出现的次数。本质上,我想返回不同位置的函数之间的差异。这些功能发生多次。
df = pd.DataFrame({
'B' : [10,20,35,50],
'C' : ['Stop','Close','Open','Finish'],
'D' : ['Home','Home Kitchen','Home','Home'],
'E' : [1,1,1,1],
})
我目前正在通过以下方式进行操作
:def f(g):
Stop = g.loc[df['C'] == 'Stop', 'B']
Finish = g.loc[df['C'] == 'Finish', 'B']
Open = g.loc[df['C'] == 'Open', 'B']
g['YX_diff'] = Finish.values[0] - Stop.values[0]
g['YZ_diff'] = Finish.values[0] - Open.values[0]
return (g)
我有执行此循环操作的位置列表。上面的df仅显示Home,但可以显示很多地方。为此,我包括以下内容:
included = ['Home']
df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)
我遇到的问题是我想看的地方。具体来说,如果字符串相似。例如:
included = ['Home']
工作正常。但是如果我包括
included = ['Home','Home Kitchen']
它返回错误:
g['YX_diff'] = Finish.values[0] - Stop.values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
我不想更改字符串,因为它们代表特定信息。我不确定还能做什么?
答案 0 :(得分:0)
字符串Home Kitchen
的所有3个过滤后的Series
均为空,因此无法选择第一个值。
s = pd.Series()
print (s)
Series([], dtype: float64)
print (s.values[0])
IndexError:索引0超出了大小为0的轴0的边界
您可以检查它:
def f(g):
Stop = g.loc[df['C'] == 'Stop', 'B']
Finish = g.loc[df['C'] == 'Finish', 'B']
Open = g.loc[df['C'] == 'Open', 'B']
print (Stop)
print (Finish)
print (Open)
# g['YX_diff'] = Finish.values[0] - Stop.values[0]
# g['YZ_diff'] = Finish.values[0] - Open.values[0]
return (g)
included = ['Home', 'Home Kitchen']
df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)
0 10
Name: B, dtype: int64
3 50
Name: B, dtype: int64
2 35
Name: B, dtype: int64
0 10
Name: B, dtype: int64
3 50
Name: B, dtype: int64
2 35
Name: B, dtype: int64
Series([], Name: B, dtype: int64)
Series([], Name: B, dtype: int64)
Series([], Name: B, dtype: int64)
这些字符串的可能解决方案是if-else
-例如设置为NaN
秒:
def f(g):
Stop = g.loc[df['C'] == 'Stop', 'B']
Finish = g.loc[df['C'] == 'Finish', 'B']
Open = g.loc[df['C'] == 'Open', 'B']
Stop = np.nan if len(Stop) == 0 else Stop.values[0]
Finish = np.nan if len(Finish) == 0 else Finish.values[0]
Open = np.nan if len(Open) == 0 else Open.values[0]
g['YX_diff'] = Finish - Stop
g['YZ_diff'] = Finish - Open
return (g)
included = ['Home', 'Home Kitchen']
df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)
print (df)
B C D E YX_diff YZ_diff
0 10 Stop Home 1 40.0 15.0
1 20 Close Home Kitchen 1 NaN NaN
2 35 Open Home 1 40.0 15.0
3 50 Finish Home 1 40.0 15.0
在纯python中的另一种解决方案,next
具有可选参数,如果没有要提取的元素,则为NaN
:
def f(g):
Stop = g.loc[df['C'] == 'Stop', 'B']
Finish = g.loc[df['C'] == 'Finish', 'B']
Open = g.loc[df['C'] == 'Open', 'B']
Stop_first = next(iter(Stop), np.nan)
Finish_first = next(iter(Finish), np.nan)
Open_first = next(iter(Open), np.nan)
g['YX_diff'] = Finish_first - Stop_first
g['YZ_diff'] = Finish_first - Open_first
return (g)
included = ['Home', 'Home Kitchen']
df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)