从字符串中提取特定单词

时间:2019-09-06 09:45:33

标签: python regex string pandas

我有一个这样的数据框:

Column_A
1. A lot of text inhere, but I want all words that have a comma in the middle. Like this: hello,world. A string can contain multiple relevant words, like hello,python and we have also many                         whit                spaces              in          the text   
2. What I want is to abstract,all words with that pattern. Not sure if it has an impact, but some parts of the strings containing "this signs". or "this,signs"                                     thanks  for helpingme                    greets! 

所需结果:

hello,world
hello,python
abstract,all
"this,signs"

我尝试使用以下代码执行此操作:

df['B'] = df['Column_A'].str.findall(r',').str.join(' ').str.strip()

但是那给了我不想要的结果。

1 个答案:

答案 0 :(得分:3)

鉴于预期输出的特定格式,看来您可以使用:

from itertools import chain

l = chain.from_iterable(df.Column_a.str.findall(r'\w+,\w+').values.tolist())
pd.Dataframe(l, columns=['Column_A'])

      Column_A
0   hello,world
1  hello,python
2  abstract,all
3    this,signs