Question

我有一个包含许多行的Python熊猫系列，这些行包含一个单词列表，例如：

25     [estimated, million, people, lived, vulnerable...
176                                   [cent, vulnerable]
7      [create, sound, policy, frameworks, poor, vuln...
299    [create, sound, policy, frameworks, cent, vuln...
283    [missing, international, levels, based, estima...
                             ...                        
63     [create, sound, policy, frameworks, world, pop...
259             [build, world, population, still, lived]
193    [create, sound, policy, frameworks, every, sta...
284    [cent, situation, remains, particularly, alarm...
43     [based, less, cent, share, property, inheritan...
Name: clean_text, Length: 300, dtype: object

如何将所有行的单词连接到一个列表中？我尝试过：

nameofmyfile.str.cat(sep=', ')

但是我得到一个错误：

TypeError：不能将.str.cat与推断出的dtype'mixed'的值一起使用。

Answer 1

这是一种骇人听闻的方式。

# step 1: Convert to a list
our_list = df["series"].tolist()

# step 2: Make a new empty list and build it up
new_list = []
for words in our_list:
    new_list += words

Answer 2

@Alexis提供的解决方案很好，但是我始终反对使用循环和对向量化进行投票。就像有问题的那样，我创建了非常相似的系列，

>>> a
foo    [hi, hello, hey]
bar     [I, me, myself]
dtype: object

现在使用numpy中的连接方法，foo, bar的列表将被连接在一起以形成单个元素数组：

>>> import numpy as np
>>> np.concatenate(a.values)
array(['hi', 'hello', 'hey', 'I', 'me', 'myself'], dtype='<U6')

现在，我不认为返回numpy数组应该有任何问题，但是如果要将输出作为列表，仍可以使用内置list()方法或numpy.ndarray的.tolist()方法来获取输出，列表。

如何在Python中串联熊猫系列的行

2 个答案: