Question

我想基于字符串列表提取行，例如单词，短语等。我的问题如下：

我是否需要每次都精确地编写此代码？
在for循环之后，我可以编写哪些代码来生成新变量？

这是我尝试过的。

fruit=['apple','banana','orange']
b1=[]
b2=[]
b3=[]
for i in range(len(df)):
    SelecgtedWords='apple'
    if SelectedWord in df.loc[i,'text']:
        a1=df.loc[i,'title']
        a2=df.loc[i,'text']
        a3=df.loc[i,'label']
        a4=df.loc[i,'author']
        b1.append(a1)
        b2.append(a2)
        b3.append(a3)
        b4.append(a4)

new_df=pd.DataFrame(columns=[title,'text','label','author'])

new_df['title']=b1
new_df['text']=b2
new_df['label']=b3
new_df['author']=b4

这基本上就像一个Excel过滤器功能，但我想使过程自动化。

Answer 1

您不需要for循环来执行此操作。补充@ Mike67建议：

fruit=['apple','banana','orange']
new_df = df.loc[df['text'].str.contains('|'.join(fruit), regex=True)]

如何提取包含所需字符串的行？

1 个答案: