数据存在以下结构:
s.No| Item Name | Source1 | Price1 | Source 2| Price 2| ....
1 | coffee | website1| 3.5 | website2| 3.5 |
2 | Tea | website3| 4.5 | website1| 4.5 |
3 | Soft Drink| website1| 1.5 | website2| 2.5 |
想要的输出要么使用excel要么使用python-pandas
ItemName| website1 | website2| website3
coffee | 3.5 | 3.5 | na
Tea | 4.5 | na | 4.5
Soft Drink| 1.5 | 2.5 | na
制表过程需要大量的手动操作,并且非常容易出错。 有人可以帮我写excel VB脚本或python的代码 - 请pandas
答案 0 :(得分:1)
这是一个解决方案:
pvt1 = df.pivot(index='Item_Name', columns='Source1', values='Price1').reset_index()
pvt2 = df.pivot(index='Item_Name', columns='Source2', values='Price2').reset_index()
pvt = pd.merge(pvt1, pvt2, on='Item_Name')
给了我们:
Item_Name website1_x website3 website1_y website2
0 Soft_Drink 1.5 NaN NaN 2.5
1 Tea NaN 4.5 4.5 NaN
2 coffee 3.5 NaN NaN 3.5
然后,这是当前处理website1的代码,但需要修复,以便它对所有这些列起作用:
pvt['website1'] = pvt['website1_x'].combine_first(pvt['website1_y'])
pvt.drop(['website1_x', 'website1_y'], axis=1, inplace=True)
输出:
Item_Name website3 website2 website1
0 Soft_Drink NaN 2.5 1.5
1 Tea 4.5 NaN 4.5
2 coffee NaN 3.5 3.5
答案 1 :(得分:0)
使用pandas
,zip
和元组解包:
prices = pd.DataFrame(index=df['Item Name'])
for idx, s_no, item, *row in df.itertuples():
# print(item, row)
iters = [iter(row)] * 2
for source, price in zip(*iters):
# print(source, price)
prices.loc[item, source] = price
Item Name website1 website2 website3 coffee 3.5 3.5 na Tea 4.5 na 4.5 Soft Drink 1.5 2.5 na
如果s.No
是索引,请从for循环中删除idx