Question

我不确定如何在数据框上下文中执行此操作

我在下面的表格中有文字信息

TEXT                                       | 
-------------------------------------------|
"Get some new #turbo #stacks today!"       |
"Is it one or three? #phone"               |
"Mayhaps it be three afterall..."          |
"So many new issues with phone... #iphone" |

我想将其编辑到仅保留带有“＃”符号的字词的位置，如下面的结果所示。

TEXT             | 
-----------------|
"#turbo #stacks" |
"#phone"         |
""               |
"#iphone"        |

在某些情况下，我还想知道是否可以通过检查NaN为真来消除空行，或者如果你运行不同类型的条件来获得这个结果：

TEXT             | 
-----------------|
"#turbo #stacks" |
"#phone"         |
"#iphone"        |

Python 2.7和pandas。

Answer 1

您可以尝试使用正则表达式和extractall：

df.TEXT.str.extractall('(#\w+)').groupby(level=0)[0].apply(' '.join)

输出：

0    #turbo #stacks
1            #phone
3           #iphone
Name: 0, dtype: object

在Pandas中保留以字母/字母开头的单词蟒蛇

1 个答案: