在其他列中搜索Ptringas Column for Substring

时间:2016-06-30 16:08:47

标签: python string pandas dataframe substring

我有一个示例.csv,导入为df.csv,如下所示:

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

我想在test1['Description']中查看大熊猫test1['Ethnicity']中的字符串。这应该返回行0,3,4和5,因为描述字符串在种族列中包含字符串。

到目前为止,我已尝试过:

df[df['Ethnicity'].str.contains('French')]['Description']

这将返回任何特定字符串,但我想在不搜索每个特定种族值的情况下进行迭代。我还尝试将列转换为列表并进行迭代,但似乎无法找到返回行的方法,因为它不再是DataFrame()。

提前谢谢!

2 个答案:

答案 0 :(得分:3)

您可以使用str.containsEthnicity列转换tolist,然后join | regex {{1} }}:

or

您似乎可以省略print ('|'.join(df.Ethnicity.tolist())) French|Italian|Danish|Dutch|English|Irish mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist())) print (mask) 0 True 1 False 2 False 3 True 4 True 5 True Name: Description, dtype: bool #boolean-indexing print (df[mask]) Ethnicity Description 0 French Irish Dance Company 3 Dutch French 4 English EnglishFrench 5 Irish Irish-American

tolist()

答案 1 :(得分:1)

备受欢迎的双重申请:

df[df.Description.apply(lambda x: df.Ethnicity.apply(lambda y: y in x)).any(1)]

  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

时序

jezrael的答案远远优于

enter image description here