我正在为我的Pandafile工作,我仍然没有想出如何解决这个问题。
我有以下熊猫对象:
pandaFile = pd.DataFrame([{'var1': 'Restaurant A','var2':'4.5','var3':
['AA','BB','CC'],'var4':['User1','User2','User3'],'var5':['Review 1','Review
2','Review 3']},{'var1': 'Restaurant B','var2':'5.0','var3':
['AA','BB','CC'],'var4':['User1','User2','User3'], 'var5':['Review 1','Review
2','Review 3']}])
print(pandaFile)
它看起来像这样:
var1 var2 var3 var4 var5
0 Restaurant A 4.5 [AA, BB, CC] [User1, User2, User3] [Review 1, Review 2, Review 3]
1 Restaurant B 5.0 [AA, BB, CC] [User1, User2, User3] [Review 1, Review 2, Review 3]
我想得到以下输出:
var1 var2 var3 var4 var5
0 Restaurant A 4.5 [AA, BB, CC] User1 Review 1
1 Restaurant A 4.5 [AA, BB, CC] User2 Review 2
2 Restaurant A 4.5 [AA, BB, CC] User3 Review 3
3 Restaurant B 5.0 [AA, BB, CC] User1 Review 1
4 Restaurant B 5.0 [AA, BB, CC] User2 Review 2
5 Restaurant B 5.0 [AA, BB, CC] User3 Review 3
但我得到以下输出:
var1 var2 var3 var4 var5
0 Restaurant A 4.5 [AA, BB, CC] User1 Review 1
1 Restaurant A 4.5 [AA, BB, CC] User1 Review 2
2 Restaurant A 4.5 [AA, BB, CC] User1 Review 3
3 Restaurant A 4.5 [AA, BB, CC] User2 Review 1
4 Restaurant A 4.5 [AA, BB, CC] User2 Review 2
5 Restaurant A 4.5 [AA, BB, CC] User2 Review 3
6 Restaurant A 4.5 [AA, BB, CC] User3 Review 1
7 Restaurant A 4.5 [AA, BB, CC] User3 Review 2
8 Restaurant A 4.5 [AA, BB, CC] User3 Review 3
9 Restaurant B 5.0 [AA, BB, CC] User1 Review 1
10 Restaurant B 5.0 [AA, BB, CC] User1 Review 2
11 Restaurant B 5.0 [AA, BB, CC] User1 Review 3
12 Restaurant B 5.0 [AA, BB, CC] User2 Review 1
13 Restaurant B 5.0 [AA, BB, CC] User2 Review 2
14 Restaurant B 5.0 [AA, BB, CC] User2 Review 3
15 Restaurant B 5.0 [AA, BB, CC] User3 Review 1
16 Restaurant B 5.0 [AA, BB, CC] User3 Review 2
17 Restaurant B 5.0 [AA, BB, CC] User3 Review 3
获取用户和评论的多行是错误的。
我尝试使用以下代码解决此问题:
mva_cols = ['var4', 'var5']
counter = 0
for x in zip(mva_cols):
pandaFile = pd.DataFrame({col:np.repeat(pandaFile[col].values,
pandaFile[mva_cols[counter]].str.len()) for col in
pandaFile.columns.difference([mva_cols[counter]])}).assign(**
{mva_cols[counter]:np.concatenate(pandaFile[mva_cols[counter]].values)})
[pandaFile.columns.tolist()]
counter = counter + 1
print(counter)
print(str(pandaFile).encode('utf-8'))
答案 0 :(得分:1)
或者你可以尝试
new_df=df.reindex(df.index.repeat(df.var5.str.len()))
new_df.assign(var4=df.var4.sum(),var5=df.var5.sum())
Out[1022]:
var1 var2 var3 var4 var5
0 Restaurant A 4.5 [AA, BB, CC] User1 Review 1
0 Restaurant A 4.5 [AA, BB, CC] User2 Review 2
0 Restaurant A 4.5 [AA, BB, CC] User3 Review 3
1 Restaurant B 5.0 [AA, BB, CC] User1 Review 1
1 Restaurant B 5.0 [AA, BB, CC] User2 Review 2
1 Restaurant B 5.0 [AA, BB, CC] User3 Review 3
答案 1 :(得分:0)
这是一个解决方案:
import pandas as pd
df = pd.DataFrame([['Restaurant A', 4.5, ['AA', 'BB', 'CC'], ['User1', 'User2', 'User3'], ['Review 1', 'Review 2', 'Review 3']],
['Restaurant B', 5.0, ['AA', 'BB', 'CC'], ['User1', 'User2', 'User3'], ['Review 1', 'Review 2', 'Review 3']]],
columns=['var1', 'var2', 'var3', 'var4', 'var5'])
df['var6'] = list(tuple(zip(i, j)) for i, j in zip(df['var4'], df['var5']))
lens = [len(item) for item in df['var6']]
df_out = pd.DataFrame( {'var1' : np.repeat(df['var1'].values, lens),
'var2' : np.repeat(df['var2'].values, lens),
'var3' : np.repeat(df['var3'].values, lens),
'var4' : np.hstack(df['var4']),
'var5' : np.hstack(df['var5'])
})
# var1 var2 var3 var4 var5
# 0 Restaurant A 4.5 [AA, BB, CC] User1 Review 1
# 1 Restaurant A 4.5 [AA, BB, CC] User2 Review 2
# 2 Restaurant A 4.5 [AA, BB, CC] User3 Review 3
# 3 Restaurant B 5.0 [AA, BB, CC] User1 Review 1
# 4 Restaurant B 5.0 [AA, BB, CC] User2 Review 2
# 5 Restaurant B 5.0 [AA, BB, CC] User3 Review 3