pandas在Series和return关键字中找到共同的字符串

时间:2015-01-28 15:07:15

标签: python regex string pandas

我想基于一系列关键字改进关于在pandas系列中搜索字符串的this previous question。我现在的问题是如何将DataFrame行中找到的关键字作为新列。关键词系列" w"是:

Skilful
Wilful
Somewhere
Thing
Strange

和DataFrame" df"是:

User_ID;Tweet
01;hi all
02;see you somewhere
03;So weird
04;hi all :-)
05;next big thing
06;how can i say no?
07;so strange
08;not at all

以下解决方案适用于屏蔽DataFrame:

import re
r = re.compile(r'.*({}).*'.format('|'.join(w.values)), re.IGNORECASE)
masked = map(bool, map(r.match, df['Tweet']))
df['Tweet_masked'] = masked

并返回:

   User_ID              Tweet Tweet_masked
0        1             hi all        False
1        2  see you somewhere         True
2        3           So weird        False
3        4         hi all :-)        False
4        5     next big thing         True
5        6  how can i say no?        False
6        7         so strange         True
7        8         not at all        False

现在我正在寻找这样的结果:

User_ID;Tweet;Keyword
01;hi all;None
02;see you somewhere;somewhere
03;So weird;None
04;hi all :-);None
05;next big thing;thing
06;how can i say no?;None
07;so strange;strange
08;not at all;None

提前感谢您的支持。

1 个答案:

答案 0 :(得分:1)

如何更换

masked = map(bool, map(r.match, df['Tweet']))

masked = [m.group(1) if m else None for m in map(r.match, df['Tweet'])]