我有2个数据帧。 DF1:
SKU USER 1 USER 2 USER 3 USER 4 USER 5 USER 6 USER 7
1001 5 2 0 0 2 2 1
1002 4 2 2 1 0 1.5 2
1003 1 1 0 0 0 3 3
1004 0 3 0 2 1 0 7
1005 1 1 0 4 4 3.5 0
1006 1 3 4 5 1 3 3
1007 0 1 1 3 0 0 5
1008 2 3 1 0 0 2.333333 0
1009 0 0 0 3 3 0 0
1010 5 6 3 0 2 4 6
DF2:
SKU USER 1 USER 2 USER 3 USER 4 USER 5 USER 6 USER 7
1001 7.398414 4.398414 2.398414 2.398414 4.398414 4.398414 3.398414
1002 6.321304 4.321304 4.321304 3.321304 2.321304 3.821304 4.321304
1003 3.535435 3.535435 2.535435 2.535435 2.535435 5.535435 5.535435
1004 2.865097 5.865097 2.865097 4.865097 3.865097 2.865097 9.865097
1005 3.152332 3.152332 2.152332 6.152332 6.152332 5.652332 2.152332
1006 2.816583 4.816583 5.816583 6.816583 2.816583 4.816583 4.816583
1007 2.378649 3.378649 3.378649 5.378649 2.378649 2.378649 7.378649
1008 4.431189 5.431189 3.431189 2.431189 2.431189 4.764522 2.431189
1009 2.196257 2.196257 2.196257 5.196257 5.196257 2.196257 2.196257
1010 7.148196 8.148196 5.148196 2.148196 4.148196 6.148196 8.148196
我想打印每个USER-SKU组合的实际(df1)和预测(df2),如下所示:
USER1 SKU 1001: ACTUAL = 5, PREDICTED = 7.398414
如何提取这些值"?
答案 0 :(得分:1)
您的数据似乎是pivoted。在这种情况下,如果您首先将unpivot (melt)数据返回到(sku, user, value)
行的表,然后合并2个表以形成(sku, user, actual, predicted)
行的表,则更容易使用。 / p>
import pandas as pd
# Reset indexes for unpivoting. If you need the original DataFrames
# as is later on, don't pass inplace=True and store the return value as
# the new index free frame.
df1.reset_index(level=0, inplace=True)
df2.reset_index(level=0, inplace=True)
# unpivot dataframes
df1_melt = pd.melt(df1, id_vars=['SKU'], var_name='USER', value_name='ACTUAL')
df2_melt = pd.melt(df2, id_vars=['SKU'], var_name='USER', value_name='PREDICTED')
# merge dataframes on SKU, USER
df_merged = df1_melt.merge(df2_melt, on=['SKU', 'USER'])
for row in df_merged.itertuples(index=False):
sku, user, actual, predicted = row
print('{user} SKU {sku}: ACTUAL = {actual}, PREDICTED = {predicted}'.format(
user=user, sku=sku, actual=actual, predicted=predicted
))
答案 1 :(得分:1)
如果您不想重命名列,我相信您可以使用循环和简单索引,如下所示:
cols = range(7)
for c in cols:
column = "USER " + str(c + 1)
rows = range(10)
for r in rows:
actual = df1.iloc[r,c]
predict = df2.iloc[r,c]
print str(column) + "SKU" + str(r + 1001) + ": ACTUAL= " + str(actual) + ", PREDICTED = " + str(predict)
希望这会有所帮助:)
答案 2 :(得分:0)
我认为在df2
然后merge
重命名列然后定义查找函数会更容易:
In [175]:
df2.rename(columns=d_cols,inplace =True)
df2.columns
Out[175]:
Index(['SKU', 'PRED USER 1', 'PRED USER 2', 'PRED USER 3', 'PRED USER 4',
'PRED USER 5', 'PRED USER 6', 'PRED USER 7'],
dtype='object')
In [184]:
df3 = df1.merge(df2)
def lookup(sku):
return 'USER1 SKU {:d}: ACTUAL = {:f}, PREDICTED = {:f}'.format(sku, df3.loc[df3['SKU'] == sku, 'USER 1'].values[0], df3.loc[df3['SKU']==sku,'PRED USER 1'].values[0])
df3['SKU'].apply(lookup).iloc[0]
Out[184]:
'USER1 SKU 1001: ACTUAL = 5.000000, PREDICTED = 7.398414'