我有以下熊猫数据框:
Person Item1 Item2 Item3 Item4
Adam Apple Eggs Cookie
Alex Chocolate Orange Eggs Potato
Gina Eggs Apple Orange Milk
我想将其转换为此:
Item Count Person1 Person2 Person3
Apple 2 Adam Gina
Eggs 3 Adam Alex Gina
Cookie 1 Adam
Chocolate 1 Alex
Orange 2 Alex Gina
Potato 1 Alex
Milk 1 Gina
发布之前,我已经彻底搜索了我的查询,但是没有找到任何匹配项(也许有一种更好的方式来重新表达我的问题)。很抱歉,如果这是重复的,但是如果是重复的,请将我定向到先前回答该问题的地方。
答案 0 :(得分:1)
使用rm
's man page首先重塑:
df = df.melt('Person', value_name='Item')
print (df)
Person variable Item
0 Adam Item1 Apple
1 Alex Item1 Chocolate
2 Gina Item1 Eggs
3 Adam Item2 Eggs
4 Alex Item2 Orange
5 Gina Item2 Apple
6 Adam Item3 Cookie
7 Alex Item3 Eggs
8 Gina Item3 Orange
9 Adam Item4 NaN
10 Alex Item4 Potato
11 Gina Item4 Milk
然后使用melt
为list
s聚合自定义函数,然后通过构造函数和GroupBy.size
创建新的DataFrame
来计数列:
f = lambda x: x.tolist()
f.__name__ = 'Person'
df1 = df.groupby('Item', sort=False)['Person'].agg([f, 'size'])
df2 = pd.DataFrame(df1.pop('Person').values.tolist(), index=df1.index).add_prefix('Person')
df3 = df1.join(df2).reset_index()
print (df3)
Item size Person0 Person1 Person2
0 Apple 2 Adam Gina None
1 Chocolate 1 Alex None None
2 Eggs 3 Gina Adam Alex
3 Orange 2 Alex Gina None
4 Cookie 1 Adam None None
5 Potato 1 Alex None None
6 Milk 1 Gina None None
答案 1 :(得分:0)
这不是您想要的,但是我不确定“易位”是否作为简单函数存在。 (顺便说一下,transpose
,遵循线性代数,通常意味着将数据帧旋转90°。)
# get items
items = []
for c in df.columns[1:]:
items.extend(df[c].values)
items = list(set(items))
items.remove(None)
people = df.Person.values
counts = {}
for p in people:
counts[p] = [1 if item in df[df['Person'] == p].values else 0 for item in items]
new = pd.DataFrame(counts, index=items)
new['Count'] = new.sum(axis=1)
输出:
| | Adam | Alex | Gina | Count |
|-----------|------|------|------|-------|
| Cookie | 1 | 0 | 0 | 1 |
| Chocolate | 0 | 1 | 0 | 1 |
| Potato | 0 | 1 | 0 | 1 |
| Eggs | 1 | 1 | 1 | 3 |
| Milk | 0 | 0 | 1 | 1 |
| Orange | 0 | 1 | 1 | 2 |
| Apple | 1 | 0 | 1 | 2 |
编辑:和往常一样,jezrael有正确的答案,但是我对其进行了调整,以获得所需的输出。对于初学者来说可能会更容易理解。
以“ df”为例:
item_counts = {}
for item in items:
counts = {}
count = 0
for p in people:
if item in df[df['Person'] == p].values:
count += 1
counts['Person' + str(count)] = p
counts['count'] = count
item_counts[item] = counts
new = pd.DataFrame.from_dict(item_counts, orient='index')
new = new[['count', 'Person1', 'Person2', 'Person3']] # rearrange columns, optional
输出:
| | count | Person1 | Person2 | Person3 |
|-----------|-------|---------|---------|---------|
| Apple | 2 | Adam | Gina | NaN |
| Chocolate | 1 | Alex | NaN | NaN |
| Cookie | 1 | Adam | NaN | NaN |
| Eggs | 3 | Adam | Alex | Gina |
| Milk | 1 | Gina | NaN | NaN |
| Orange | 2 | Alex | Gina | NaN |
| Potato | 1 | Alex | NaN | NaN |