数据框看起来像这样
col_a
Python PY is a general purpose PY language
Programming PY language in Python PY
Its easier to understand PY
The syntax of the language is clean PY
此代码我试图实现此功能,但无法获得预期的输出。如果有帮助的话。
以下是我使用正则表达式处理的以下代码:
df['col_a'].str.extract(r"([a-zA-Z'-]+\s+PY)\b")
所需的输出:
col_a col_b_PY
Python PY is a general purpose language Python PY purpose PY
Programming PY language in Python PY Python PY Programming PY
Its easier to understand PY understand PY
The syntax of the language is clean PY clean PY
答案 0 :(得分:3)
答案 1 :(得分:2)
使用@Michal的正则表达式:
import re
def app(row):
return ' '.join(re.findall(r'\w+\s+PY', row.col_a))
df['col_b_PY'] = df.apply(app, axis=1)
您需要将应用函数中每一行的所有匹配项串联起来。也可以使用extractall
来做到这一点,但是我发现这更简单,更直接。