Question

我正在尝试捕获列表格式的数据框/熊猫内部的元素。如果该字符串存在，下面将捕获整个列表，我如何仅按行捕获特定字符串的元素，而忽略其余部分？

这是我尝试过的...

l1 = [1,2,3,4,5,6]
l2 = ['hello world \n my world','world is a great place \n we live in it','planet earth',np.NaN,'\n save the water','']

df = pd.DataFrame(list(zip(l1,l2)),
            columns=['id','sentence'])
df['sentence_split'] = df['sentence'].str.split('\n')
print(df)

此代码的结果：

df[df.sentence_split.str.join(' ').str.contains('world', na=False)]  # does the trick but still not exactly what I am looking for. 


id  sentence                                  sentence_split
1   hello world \n my world                   [hello world , my world]
2   world is a great place \n we live in it   [world is a great place , we live in it]

但正在寻找：

id  sentence                                  sentence_split
1   hello world \n my world                   hello world; my world
2   world is a great place \n we live in it   world is a great place

Answer 1

您要搜索系列列表中的字符串。一种方法是：

# Drop NaN rows
df = df.dropna(subset=["sentence_split"])

应用仅保留要查找列表中元素的函数

# Apply this lamda function
df["sentence_split"] = df["sentence_split"].apply(lambda x: [i for i in x if "world" in i])

   id                                 sentence             sentence_split
0   1                  hello world \n my world  [hello world ,  my world]
1   2  world is a great place \n we live in it  [world is a great place ]
2   3                             planet earth                         []
4   5                        \n save the water                         []
5   6                                                                  []

Python-数据框行内列表中的搜索元素

1 个答案: