Question

我有一个示例.csv，导入为df.csv，如下所示：

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

我想在test1['Description']中查看大熊猫test1['Ethnicity']中的字符串。这应该返回行0,3,4和5，因为描述字符串在种族列中包含字符串。

到目前为止，我已尝试过：

df[df['Ethnicity'].str.contains('French')]['Description']

这将返回任何特定字符串，但我想在不搜索每个特定种族值的情况下进行迭代。我还尝试将列转换为列表并进行迭代，但似乎无法找到返回行的方法，因为它不再是DataFrame（）。

提前谢谢！

Answer 1

您可以使用str.contains列Ethnicity列转换tolist，然后join | regex {{1} }}：

or

您似乎可以省略print ('|'.join(df.Ethnicity.tolist())) French|Italian|Danish|Dutch|English|Irish mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist())) print (mask) 0 True 1 False 2 False 3 True 4 True 5 True Name: Description, dtype: bool #boolean-indexing print (df[mask]) Ethnicity Description 0 French Irish Dance Company 3 Dutch French 4 English EnglishFrench 5 Irish Irish-American：

tolist()

Answer 2

备受欢迎的双重申请：

df[df.Description.apply(lambda x: df.Ethnicity.apply(lambda y: y in x)).any(1)]

  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

时序

jezrael的答案远远优于

在其他列中搜索Ptringas Column for Substring

2 个答案:

时序