Question

我有以下代码行：

# slice off the last 4 chars in name wherever its code contains the substring '-CUT'
df['name'] = np.where(df['code'].str.contains('-CUT'),
                      df['name'].str[:-4], df['name'])

但是，这似乎无法正常工作。它将最后四个字符切成正确的列，但也将其用于代码为None / empty（几乎所有实例）的行。

我在使用np.where的方式上有明显的错误吗？

Answer 1

np.where

您可以将regex=False和na=False指定为pd.Series.str.contains的参数，以便仅更新满足条件的行：

df['name'] = np.where(df['code'].str.contains('-CUT', regex=False, na=False),
                      df['name'].str[:-4], df['name'])

regex=False对于此标准不是严格必需的，但它可以提高性能。 na=False确保无法通过str方法处理的任何类型都返回False。

pd.DataFrame.loc

或者，您可以使用pd.DataFrame.loc。这似乎比指定“不变”系列作为np.where的最终参数更自然：

mask = df['code'].str.contains('-CUT', regex=False, na=False)
df.loc[mask, 'name'] = df['name'].str[:-4]

熊猫-numpy.where

1 个答案:

np.where

pd.DataFrame.loc