我有以下数据框:
Course Orders Ingredient 1 Ingredient 2 Ingredient 3
starter 3 Fish Bread Mayonnaise
starter 1 Olives Bread
starter 5 Hummus Pita
main 1 Pizza
main 6 Beef Potato Peas
main 9 Fish Peas
main 11 Bread Mayonnaise Beef
main 4 Pasta Bolognese Peas
desert 10 Cheese Olives Crackers
desert 7 Cookies Cream
desert 8 Cheesecake Cream
我想总结每道菜每种配料的订购数量。成分所在的列并不重要。
以下数据框是我希望输出的内容:
Course Ord Ing1 IngOrd1 Ing2 IngOrd2 Ing3 IngOrd3
starter 3 Fish 3 Bread 4 Mayo 3
starter 1 Olives 1 Bread 4
starter 5 Hummus 5 Pita 5
main 1 Pizza 1
main 6 Beef 17 Potato 6 Peas 21
main 9 Fish 9 Peas 21
main 11 Bread 11 Mayo 11 Beef 17
main 4 Pasta 4 Bolognese 4 Peas 21
desert 10 Cheese 10 Olives 10 Crackers 10
desert 7 Cookies 7 Cream 15
desert 8 Cheesecake 8 Cream 15
我尝试使用groupby()。sum(),但这不适用于3列中的成分。
我也不能使用查找,因为在整个数据框中有一些实例,我不知道我要寻找什么成分。
答案 0 :(得分:0)
我不相信使用groupby或其他类似的大熊猫方法可以做到这一点,尽管我很高兴被证明是错误的。无论如何,以下内容并不是特别漂亮,但是它将为您提供所要追求的。
import pandas as pd
from collections import defaultdict
# The data you provided
df = pd.read_csv('orders.csv')
# Group these labels for convenience
ingredients = ['Ingredient 1', 'Ingredient 2', 'Ingredient 3']
orders = ['IngOrd1', 'IngOrd2', 'IngOrd3']
# Interleave the two lists for final data frame
combined = [y for x in zip(ingredients, orders) for y in x]
# Restructure the data frame so we can group on ingredients
melted = pd.melt(df, id_vars=['Course', 'Orders'], value_vars=ingredients, value_name='Ingredient')
# This is a map that we can apply to each ingredient column to
# look up the correct order count
maps = defaultdict(lambda: defaultdict(int))
# Build the map. Every course/ingredient pair is keyed to the total
# count for that pair, e.g. {(main, beef): 17, ...}
for index, group in melted.groupby(['Course', 'Ingredient']):
course, ingredient = index
maps[course][ingredient] += group.Orders.sum()
# Now apply the map to each ingredient column of the data frame
# to create the new count columns
for i, o in zip(ingredients, orders):
df[o] = df.apply(lambda x: maps[x.Course][x[i]], axis=1)
# Adjust the columns labels
df = df[['Course', 'Orders'] + combined]
print df
Course Orders Ingredient 1 IngOrd1 Ingredient 2 IngOrd2 Ingredient 3 IngOrd3
0 starter 3 Fish 3 Bread 4 Mayonnaise 3
1 starter 1 Olives 1 Bread 4 NaN 0
2 starter 5 Hummus 5 Pita 5 NaN 0
3 main 1 Pizza 1 NaN 0 NaN 0
4 main 6 Beef 17 Potato 6 Peas 19
5 main 9 Fish 9 Peas 19 NaN 0
6 main 11 Bread 11 Mayonnaise 11 Beef 17
7 main 4 Pasta 4 Bolognese 4 Peas 19
8 desert 10 Cheese 10 Olives 10 Crackers 10
9 desert 7 Cookies 7 Cream 15 NaN 0
10 desert 8 Cheesecake 8 Cream 15 NaN 0
如果这是一个问题,则需要处理NaN和0计数。但这是一项琐碎的任务。