我有两个数据框,df1包含Id和Items,该列表的表示如下所示。
ITEM ID -
[(itemid, weight, 3)]
使用以下数据框,我们需要使用项目数据框值将列表中每个值的itemID的权重相加。
ID - DF1
ID ITEMS
11 [(123, 2.12,3),(234, 1.2,3)]
22 [(567, 2.3, 3),(245, 1.9,3)]
33 [(999,4.5, 3),(222, 2.0,3)]
44 [(223, 2.34,3),(234,3.5,3)]
ITEM - DF2
ITEMS WEIGHT
123 2.5
234 1.8
567 19
245 3
999 2
222 2.9
223 4.2
预期产出:
ID Items
11 [(123, 2.12+2.5,3),(234, 1.2+1.8,3)]
22 [(567,4.2,3),(245,4.9,3)]
33 [(999, 6.5, 3), (222, 4.9,3)]
44 [(223, 6.54,3),(234,5.3,3)]
第一行显示为示例, [(123,2.12 + 2.5,3),(234,1.2 + 1.8,3)] 。
答案 0 :(得分:1)
这是使用pd.Series.apply
的一种方式:
import pandas as pd
df1 = pd.DataFrame({'ID': [11, 22, 33, 44],
'ITEMS': [[(123, 2.12,3),(234, 1.2,3)],
[(567, 2.3, 3),(245, 1.9,3)],
[(999,4.5, 3),(222, 2.0,3)],
[(223, 2.34,3),(234,3.5,3)]]})
df2 = pd.DataFrame({'ITEMS': [123, 234, 567, 245, 999, 222, 223],
'WEIGHT': [2.5, 1.8, 19, 3, 2, 2.9, 4.2]})
s = df2.set_index('ITEMS')['WEIGHT']
df1['ITEMS'] = df1['ITEMS'].apply(lambda x: [(i[0], i[1]+s.get(i[0]), i[2]) for i in x])
print(df1)
# ID ITEMS
# 0 11 [(123, 4.62, 3), (234, 3.0, 3)]
# 1 22 [(567, 21.3, 3), (245, 4.9, 3)]
# 2 33 [(999, 6.5, 3), (222, 4.9, 3)]
# 3 44 [(223, 6.54, 3), (234, 5.3, 3)]
在我看来,如果可能的话,最好将数字数据分成不同的列并使用矢量化功能。
答案 1 :(得分:1)
这是另一个更新行的方法。
import pandas as pd
import numpy as np
items = {'id': [11, 22, 33, 44], 'items': [[(123, 2.12,3),(234, 1.2,3)],
[(567, 2.3, 3),(245, 1.9,3)],
[(999,4.5, 3),(222, 2.0,3)],
[(223, 2.34,3),(234,3.5,3)]
]}
df1 = pd.DataFrame(data=items)
item_weight_data = {'items': [123, 234, 567, 245, 999, 222, 223], 'weight':[2.5, 1.8, 19, 3, 2, 2.9, 4.2]}
df2 = pd.DataFrame(data=item_weight_data)
df2 = df2.set_index('items')
#function that takes row and dataframe as input and returns new row.
def update_weight(row, item_df):
try:
new_row = [];
for item in row:
weight = item_df.loc[item[0],'weight']
#since item is a tuple, It cannot be updated.
#so creating new updated tuple and appending it to the list.
updated_item = (item[0],(item[1] + weight),item[2])
new_row.append(updated_item)
return new_row
except Exception as e:
raise ValueError("UNEXPECTED_DATA")
df1['items'] = df1['items'].apply(lambda x: update_weight(x, df2))
print(df1)
我希望它有所帮助。