将pandas列中的字符串列表转换为字符串

时间:2020-02-20 19:25:01

标签: python pandas

我有一个带有类似列的df:

            
                       words
1                     ['me']
2                   ['they']
4         ['it', 'we', 'it']
5                         []
6         ['we', 'we', 'it']

我希望它看起来像这样:

words
1                     'me'
2                   'they'
4               'it we it'
5                       ''          
6               'we we it'

我尝试了这两个选项,但是它们的结果都与原始系列相同。

def join_words(df):
    
    words_string= ''.join(df.words)

    return words_string

master_df['words_string'] = master_df.apply(join_words, axis=1)

和...

master_df['words_String'] = master_df.words.str.join(' ')

这两者都会导致原始df。我在做什么错了?

编辑

使用master_df['words_string'] = master_df['words'].apply(' '.join),我得到了:


1                                     [ ' m e ' ]
2                                 [ ' t h e y ' ]
4             [ ' i t ' ,   ' w e ' ,   ' i t ' ]
5                                             [ ]
6             [ ' w e ' ,   ' w e ' ,   ' i t ' ]

3 个答案:

答案 0 :(得分:1)

编辑:

如您的编辑所示,似乎行实际上不是lists而是strings解释为列表。我们可以使用eval来确保格式为list类型,以便稍后执行join。看来您的示例数据如下:

df = pd.DataFrame({'index':[0,1,2,3,4],
                   'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})

这个怎么样?将apply与lambda函数结合使用,该函数对每一行(列表)使用' '.join()

df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)

输出:

   index     words
0      0        me
1      1      they
2      2  it we it
3      3          
4      4  we we it

答案 1 :(得分:1)

通常我建议不要使用eval。当元素是string而不是list时,这是另一种方法:

words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)

输出:

1          me
2        they
4    it we it
6    we we it
Name: 0, dtype: object

答案 2 :(得分:0)

另一个想法是使用DataFrame.explode(自0.25.0版开始)和groupby / aggregate方法。

import pandas as pd

# create a list of list of strings
values = [
    ['me'],
    ['they'],
    ['it', 'we', 'it'],
    [],
    ['we', 'we', 'it']
]

# convert to a data frame
df = pd.DataFrame({'words': values})

# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2

这将以长格式创建一个表,并提供以下输出:

  words
0    me
1  they
2    it
2    we
2    it
3   nan
4    we
4    we
4    it

现在,长格式需要加入/聚合:

# make sure the dtype is string
df2['words'] = df2['words'].astype(str)

# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)

给出输出:

      words
0        me
1      they
2  it we it
3       nan
4  we we it