使用python中的pandas将关键字映射到dataframe列

时间:2017-10-04 07:33:42

标签: python regex pandas dataframe data-analysis

我有一个数据框,

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

和一个清单,

my_list=["one"]

 I tried mask=df["Description"].str.contains('|'.join(my_list),na=False)

但它给出了,

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player

必须考虑阶段列,我想要与描述相关联的所有行。

2 个答案:

答案 0 :(得分:1)

我认为你需要:

print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1              2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#replace empty or whitespaces by previous value
df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()
print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1     Sri      2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#get all names by condition
my_list = ["one"]
names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']
print (names)
0    Sri
2    Ram
Name: Name, dtype: object

#select all rows contains names
df = df[df['Name'].isin(names)]
print (df)
  Name  Stage                                Description
0  Sri      1  Sri is one of the good singer in this two
1  Sri      2                         Thanks for reading
2  Ram      1      Ram is one of the good cricket player

答案 1 :(得分:0)

它似乎正在寻找一个"一个"在dataframe的Description字段中并返回匹配的描述。

如果你想要第三行,你必须为第二次匹配添加一个数组元素

例如。 '感谢'所以像my_list = [" one","谢谢"]