转置/反转熊猫数据框的最简单方法是什么?

时间:2018-08-28 08:34:13

标签: python python-3.x pandas dataframe

我有以下熊猫数据框:

Person     Item1      Item2     Item3     Item4
Adam       Apple      Eggs      Cookie
Alex       Chocolate  Orange    Eggs      Potato
Gina       Eggs       Apple     Orange    Milk

我想将其转换为此:

Item      Count     Person1     Person2     Person3
Apple     2         Adam        Gina
Eggs      3         Adam        Alex        Gina
Cookie    1         Adam
Chocolate 1         Alex
Orange    2         Alex        Gina
Potato    1         Alex
Milk      1         Gina

发布之前,我已经彻底搜索了我的查询,但是没有找到任何匹配项(也许有一种更好的方式来重新表达我的问题)。很抱歉,如果这是重复的,但是如果是重复的,请将我定向到先前回答该问题的地方。

2 个答案:

答案 0 :(得分:1)

使用rm's man page首先重塑:

df = df.melt('Person', value_name='Item')
print (df)
   Person variable       Item
0    Adam    Item1      Apple
1    Alex    Item1  Chocolate
2    Gina    Item1       Eggs
3    Adam    Item2       Eggs
4    Alex    Item2     Orange
5    Gina    Item2      Apple
6    Adam    Item3     Cookie
7    Alex    Item3       Eggs
8    Gina    Item3     Orange
9    Adam    Item4        NaN
10   Alex    Item4     Potato
11   Gina    Item4       Milk

然后使用meltlist s聚合自定义函数,然后通过构造函数和GroupBy.size创建新的DataFrame来计数列:

f = lambda x: x.tolist()
f.__name__ = 'Person'
df1 = df.groupby('Item', sort=False)['Person'].agg([f, 'size'])

df2 = pd.DataFrame(df1.pop('Person').values.tolist(), index=df1.index).add_prefix('Person')
df3 = df1.join(df2).reset_index()
print (df3)
        Item  size Person0 Person1 Person2
0      Apple     2    Adam    Gina    None
1  Chocolate     1    Alex    None    None
2       Eggs     3    Gina    Adam    Alex
3     Orange     2    Alex    Gina    None
4     Cookie     1    Adam    None    None
5     Potato     1    Alex    None    None
6       Milk     1    Gina    None    None

答案 1 :(得分:0)

这不是您想要的,但是我不确定“易位”是否作为简单函数存在。 (顺便说一下,transpose,遵循线性代数,通常意味着将数据帧旋转90°。)

# get items
items = []
for c in df.columns[1:]:
    items.extend(df[c].values)
items = list(set(items))
items.remove(None)

people = df.Person.values
counts = {}
for p in people:
    counts[p] = [1 if item in df[df['Person'] == p].values else 0 for item in items]

new = pd.DataFrame(counts, index=items)
new['Count'] = new.sum(axis=1)

输出:

|           | Adam | Alex | Gina | Count |
|-----------|------|------|------|-------|
| Cookie    | 1    | 0    | 0    | 1     |
| Chocolate | 0    | 1    | 0    | 1     |
| Potato    | 0    | 1    | 0    | 1     |
| Eggs      | 1    | 1    | 1    | 3     |
| Milk      | 0    | 0    | 1    | 1     |
| Orange    | 0    | 1    | 1    | 2     |
| Apple     | 1    | 0    | 1    | 2     |

编辑:和往常一样,jezrael有正确的答案,但是我对其进行了调整,以获得所需的输出。对于初学者来说可能会更容易理解。

以“ df”为例:

item_counts = {}
for item in items:
    counts = {}
    count = 0
    for p in people:
        if item in df[df['Person'] == p].values:
            count += 1
            counts['Person' + str(count)] = p
    counts['count'] = count
    item_counts[item] = counts

new = pd.DataFrame.from_dict(item_counts, orient='index')
new = new[['count', 'Person1', 'Person2', 'Person3']] # rearrange columns, optional

输出:

|           | count | Person1 | Person2 | Person3 |
|-----------|-------|---------|---------|---------|
| Apple     | 2     | Adam    | Gina    | NaN     |
| Chocolate | 1     | Alex    | NaN     | NaN     |
| Cookie    | 1     | Adam    | NaN     | NaN     |
| Eggs      | 3     | Adam    | Alex    | Gina    |
| Milk      | 1     | Gina    | NaN     | NaN     |
| Orange    | 2     | Alex    | Gina    | NaN     |
| Potato    | 1     | Alex    | NaN     | NaN     |