Question

我在 Pandas 数据框中有一列，其中包含一个字符串列表。每个字符串由逗号分隔。

一行中的列表看起来像这样：

list = ['banana bread is yummy', 'i hate to have some more bread, can't we eat apples?', 'apples are not good for you, they make you hungry']

我一直在尝试根据正则表达式拆分列的每一行中的列表，以获得以下输出：

banana bread is yummy
i hate to have some more bread, can't we eat apples?
apples are not good for you, they make you hungry

但是当我使用

s = df.assign(conversation=df['conversation'].str.split(',')).explode('conversation')

整个列表用逗号分隔，无论它们是否在同一个字符串中。给我这个输出：

banana bread is yummy
i hate to have some more bread
can't we eat apples?
apples are not good for you 
they make you hungry

关于如何为此使用正则表达式的任何建议？我尝试了几件事，但得到的结果非常随机。

编辑：

我尝试过的另一种方法是：

df['conversation'] = df['conversation'].str.strip('[]')

我首先从每一行中删除了方括号，然后拆分了所有内容。虽然这种方法有效，但它给我留下了随机的空行。

Answer 1

我只能根据这个回复回答我自己的问题here :-)

s = df.assign(conversation =df['conversation'].str.split(",(?=(?:[^\']*\'[^\']*\')*[^\']*$)")).explode('conversation')