pandas从DataFrame字符串搜索中获取“单个”关键字

时间:2015-01-30 12:08:21

标签: python regex string pandas

这是this previous topic的后续行动。我有一系列名为"英国"像这样:

British
\bSkilful\b
\bWilful\b
\bfulfil\b
\b.*favour.*\b
\bappal\b
\bappall.*\b
\barbour.*\b
\barmor.*\b
\bstrange\b
\brumor.*\b
\b.*color.*\b
\b.*centre's\b

和这样的DataFrame df:

 User_ID     Tweet
 01          hi all
 02          see you something
 03          that's my favourite spot
 04          the strangest rumors
 05          my appal is nice
 06          check my rumor
 07          #brborboncheckruMoreThanever
 08          look @mycentre's

我想获得一个包含字符串中找到的SINGLE关键字的新列。到目前为止我做了:

 List = pd.read_csv('w.txt')
 r = re.compile(r'.*({}).*'.format('|'.join(List['British'].values)), re.IGNORECASE)

然后屏蔽DataFrame:

  masked = map(bool, map(r.search, df['Tweet']))
  df2 = df[masked]

然后我再次屏蔽它以添加'关键字'柱:

 mask = [m.group(1) if m else None for m in map(r.search, df2['Tweet'])]
 df2['keyword'] = mask

返回:

   User_ID                     Tweet         keyword
2        3  that's my favourite spot  favourite spot
4        5          my appal is nice           appal
5        6            check my rumor           rumor
7        8          look @mycentre's      mycentre's

因此布尔掩码工作正常,只检测包含至少一个关键字的推文。但是,如果我只想提取找到的单个关键字呢?最终的DataFrame应该是:

   User_ID                     Tweet         keyword
2        3  that's my favourite spot       favourite
4        5          my appal is nice           appal
5        6            check my rumor           rumor
7        8          look @mycentre's        centre's

非常感谢您的帮助。

0 个答案:

没有答案