Question

所以我在熊猫中有一个数据框，其中有很多列。

一列包含一个列表，其中包含用[u'str'，]分隔的字符串，如下所示。每行中的字符串数不相等。

column x
[u'str1', u'str2', u'str3']
[u'str4', u'str1']
[u'str5', u'str7', u'str8', u'str9']

我想在数据框中创建名为x-1列的新列，直到x-n的x-2列

我如何：

弄清楚我需要多少个新列（即最大列表中有多少个成员？）
使用上述术语创建那么多列。
最重要的是：将字符串拆分为新的列，只保留单引号之间的内容（即，丢失u，'和逗号）

Answer 1

如果“ column x”是列表的列，则可以将该列作为Series传递来创建新的DataFrame。

df['column x']
0    [a, b, c]
1          [d]
2       [e, f]
dtype: object

df2 = pd.DataFrame(
    df['column x'].tolist()).rename(lambda x: 'x-{}'.format(x + 1), axis=1)
df2

  x-1   x-2   x-3
0   a     b     c
1   d  None  None
2   e     f  None

要将这些列添加回df，请使用pd.concat：

df = pd.concat([df, df2, axis=1])

Answer 2

这个问题的确切代码是：

df_test['actors_list'] = df_m.actors_list.str.split('u\'') #splits based on deliminator u' (the \ is the escape character)
df_test2 = pd.DataFrame(
    df_test['actors_list'].tolist()).rename(lambda x: 'actors_list-{}'.format(x + 1), axis=1)
df_test2

当每个列表可能具有不同数量的成员时，如何将列表中的一列拆分为新列？

2 个答案: