类似的字符串导致IndexError

时间:2018-08-04 06:37:14

标签: python pandas loops dataframe

我有一个pandas df,其中包含各种功能和时间戳。我试图有效地返回不同功能之间的差异。

这是df的非常小的示例。 Col C表示功能,B显示时间戳,D显示不同的位置,E显示出现的次数。本质上,我想返回不同位置的函数之间的差异。这些功能发生多次。

df = pd.DataFrame({          
    'B' : [10,20,35,50],
    'C' : ['Stop','Close','Open','Finish'],
    'D' : ['Home','Home Kitchen','Home','Home'],          
    'E' : [1,1,1,1],          
    })

我目前正在通过以下方式进行操作

def f(g):
    Stop = g.loc[df['C'] == 'Stop', 'B']
    Finish = g.loc[df['C'] == 'Finish', 'B']
    Open = g.loc[df['C'] == 'Open', 'B']
    g['YX_diff'] = Finish.values[0] - Stop.values[0]
    g['YZ_diff'] = Finish.values[0] - Open.values[0]

    return (g)

我有执行此循环操作的位置列表。上面的df仅显示Home,但可以显示很多地方。为此,我包括以下内容:

included = ['Home']

df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)

我遇到的问题是我想看的地方。具体来说,如果字符串相似。例如:

included = ['Home']

工作正常。但是如果我包括

included = ['Home','Home Kitchen']

它返回错误:

    g['YX_diff'] = Finish.values[0] - Stop.values[0]

IndexError: index 0 is out of bounds for axis 0 with size 0

我不想更改字符串,因为它们代表特定信息。我不确定还能做什么?

1 个答案:

答案 0 :(得分:0)

字符串Home Kitchen的所有3个过滤后的Series均为空,因此无法选择第一个值。

s = pd.Series()
print (s)
Series([], dtype: float64)

print (s.values[0])
  

IndexError:索引0超出了大小为0的轴0的边界

您可以检查它:

def f(g):
    Stop = g.loc[df['C'] == 'Stop', 'B']
    Finish = g.loc[df['C'] == 'Finish', 'B']
    Open = g.loc[df['C'] == 'Open', 'B']
    print (Stop)
    print (Finish)
    print (Open)
#    g['YX_diff'] = Finish.values[0] - Stop.values[0]
#    g['YZ_diff'] = Finish.values[0] - Open.values[0]

    return (g)

included = ['Home', 'Home Kitchen']

df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)

0    10
Name: B, dtype: int64
3    50
Name: B, dtype: int64
2    35
Name: B, dtype: int64
0    10
Name: B, dtype: int64
3    50
Name: B, dtype: int64
2    35
Name: B, dtype: int64
Series([], Name: B, dtype: int64)
Series([], Name: B, dtype: int64)
Series([], Name: B, dtype: int64)

这些字符串的可能解决方案是if-else-例如设置为NaN秒:

def f(g):
    Stop = g.loc[df['C'] == 'Stop', 'B']
    Finish = g.loc[df['C'] == 'Finish', 'B']
    Open = g.loc[df['C'] == 'Open', 'B']
    Stop = np.nan if len(Stop) == 0 else Stop.values[0]
    Finish = np.nan if len(Finish) == 0 else Finish.values[0]
    Open = np.nan if len(Open) == 0 else Open.values[0]

    g['YX_diff'] = Finish - Stop
    g['YZ_diff'] = Finish - Open

    return (g)

included = ['Home', 'Home Kitchen']

df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)
print (df)
    B       C             D  E  YX_diff  YZ_diff
0  10    Stop          Home  1     40.0     15.0
1  20   Close  Home Kitchen  1      NaN      NaN
2  35    Open          Home  1     40.0     15.0
3  50  Finish          Home  1     40.0     15.0

在纯python中的另一种解决方案,next具有可选参数,如果没有要提取的元素,则为NaN

def f(g):
    Stop = g.loc[df['C'] == 'Stop', 'B']
    Finish = g.loc[df['C'] == 'Finish', 'B']
    Open = g.loc[df['C'] == 'Open', 'B']

    Stop_first = next(iter(Stop), np.nan)
    Finish_first = next(iter(Finish), np.nan)
    Open_first = next(iter(Open), np.nan)

    g['YX_diff'] = Finish_first - Stop_first
    g['YZ_diff'] = Finish_first - Open_first

    return (g)

included = ['Home', 'Home Kitchen']

df = df[df.D.isin(included)].groupby(['D', 'E']).apply(f)