Question

我有一个带有类似列的df：

            
                       words
1                     ['me']
2                   ['they']
4         ['it', 'we', 'it']
5                         []
6         ['we', 'we', 'it']

我希望它看起来像这样：

words
1                     'me'
2                   'they'
4               'it we it'
5                       ''          
6               'we we it'

我尝试了这两个选项，但是它们的结果都与原始系列相同。

def join_words(df):
    
    words_string= ''.join(df.words)

    return words_string

master_df['words_string'] = master_df.apply(join_words, axis=1)

和...

master_df['words_String'] = master_df.words.str.join(' ')

这两者都会导致原始df。我在做什么错了？

编辑

使用master_df['words_string'] = master_df['words'].apply(' '.join)，我得到了：


1                                     [ ' m e ' ]
2                                 [ ' t h e y ' ]
4             [ ' i t ' ,   ' w e ' ,   ' i t ' ]
5                                             [ ]
6             [ ' w e ' ,   ' w e ' ,   ' i t ' ]

Answer 1

编辑：

如您的编辑所示，似乎行实际上不是lists而是strings解释为列表。我们可以使用eval来确保格式为list类型，以便稍后执行join。看来您的示例数据如下：

df = pd.DataFrame({'index':[0,1,2,3,4],
                   'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})

这个怎么样？将apply与lambda函数结合使用，该函数对每一行（列表）使用' '.join()：

df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)

输出：

   index     words
0      0        me
1      1      they
2      2  it we it
3      3          
4      4  we we it

Answer 2

通常我建议不要使用eval。当元素是string而不是list时，这是另一种方法：

words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)

输出：

1          me
2        they
4    it we it
6    we we it
Name: 0, dtype: object

Answer 3

另一个想法是使用DataFrame.explode（自0.25.0版开始）和groupby / aggregate方法。

import pandas as pd

# create a list of list of strings
values = [
    ['me'],
    ['they'],
    ['it', 'we', 'it'],
    [],
    ['we', 'we', 'it']
]

# convert to a data frame
df = pd.DataFrame({'words': values})

# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2

这将以长格式创建一个表，并提供以下输出：

  words
0    me
1  they
2    it
2    we
2    it
3   nan
4    we
4    we
4    it

现在，长格式需要加入/聚合：

# make sure the dtype is string
df2['words'] = df2['words'].astype(str)

# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)

给出输出：

      words
0        me
1      they
2  it we it
3       nan
4  we we it

将pandas列中的字符串列表转换为字符串

编辑

3 个答案:

编辑：