我有一个带有类似列的df:
words
1 ['me']
2 ['they']
4 ['it', 'we', 'it']
5 []
6 ['we', 'we', 'it']
我希望它看起来像这样:
words
1 'me'
2 'they'
4 'it we it'
5 ''
6 'we we it'
我尝试了这两个选项,但是它们的结果都与原始系列相同。
def join_words(df):
words_string= ''.join(df.words)
return words_string
master_df['words_string'] = master_df.apply(join_words, axis=1)
和...
master_df['words_String'] = master_df.words.str.join(' ')
这两者都会导致原始df。我在做什么错了?
使用master_df['words_string'] = master_df['words'].apply(' '.join)
,我得到了:
1 [ ' m e ' ]
2 [ ' t h e y ' ]
4 [ ' i t ' , ' w e ' , ' i t ' ]
5 [ ]
6 [ ' w e ' , ' w e ' , ' i t ' ]
答案 0 :(得分:1)
如您的编辑所示,似乎行实际上不是lists
而是strings
解释为列表。我们可以使用eval
来确保格式为list
类型,以便稍后执行join
。看来您的示例数据如下:
df = pd.DataFrame({'index':[0,1,2,3,4],
'words':["['me']","['they']","['it','we','it']","[]","['we','we','it']"]})
这个怎么样?将apply
与lambda函数结合使用,该函数对每一行(列表)使用' '.join()
:
df['words'] = df['words'].apply(eval).apply(' '.join)
print(df)
输出:
index words
0 0 me
1 1 they
2 2 it we it
3 3
4 4 we we it
答案 1 :(得分:1)
通常我建议不要使用eval
。当元素是string
而不是list
时,这是另一种方法:
words.str.extractall("'(\w*)'").groupby(level=0)[0].agg(' '.join)
输出:
1 me
2 they
4 it we it
6 we we it
Name: 0, dtype: object
答案 2 :(得分:0)
另一个想法是使用DataFrame.explode(自0.25.0版开始)和groupby / aggregate方法。
import pandas as pd
# create a list of list of strings
values = [
['me'],
['they'],
['it', 'we', 'it'],
[],
['we', 'we', 'it']
]
# convert to a data frame
df = pd.DataFrame({'words': values})
# explode the cells (with lists) into separate rows having the same index
df2 = df.explode('words')
df2
这将以长格式创建一个表,并提供以下输出:
words
0 me
1 they
2 it
2 we
2 it
3 nan
4 we
4 we
4 it
现在,长格式需要加入/聚合:
# make sure the dtype is string
df2['words'] = df2['words'].astype(str)
# group by the index aggregating all values to a single string
df2.groupby(level=0).agg(' '.join)
给出输出:
words
0 me
1 they
2 it we it
3 nan
4 we we it