Question

我的数据框如下

+----+------+------+-----+-----+
| id | year | sell | buy | own |
+----+------+------+-----+-----+
| 1  | 2016 | 9    | 2   | 10  |
| 1  | 2017 | 9    | 0   | 10  |
| 1  | 2018 | 0    | 2   | 10  |
| 2  | 2016 | 7    | 2   | 11  |
| 2  | 2017 | 2    | 0   |  0  |
| 2  | 2018 | 0    | 0   | 18  |
+----+------+------+-----+-----+

我正在尝试将行转置为列，但我希望保留一些字母（如果不是0（S出售，B买入，O拥有），而不是汇总值。如果特定年份的所有列都有值，那么我需要该年份的S_B_O。如果仅存在用于买卖的值，则包含S_B等，因此预期输出为

+----+-------+------+------+
| ID | 2016  | 2017 | 2018 |
+----+-------+------+------+
| 1  | S_B_O | S_O  | B_O  |
+----+-------+------+------+
| 2  | S_B_O | S    | O    |
+----+-------+------+------+

我是python的新手，不知道我们如何做到这一点。我只知道聚合的基本原理，如下所示。可能吗？任何建议将不胜感激。

import pandas as pd
import numpy as np

df=pd.read_excel('Pivot.xlsx')

pivot = pd.pivot_table(df,index=["ID"],columns='year',values ='sell' ,aggfunc = np.sum,fill_value=0)

数据框

id,year,sell,buy,own
1,2016,9,2,10
1,2017,9,0,10
1,2018,0,2,10
2,2016,7,2,11
2,2017,2,0,0
2,2018,0,0,18

Answer 1

您可以在此处将df.dot与df.pivot一起使用：

u = df[['sell','buy','own']]
(df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v"))

year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

具有完整格式；

u = df[['sell','buy','own']]
out = (df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
      .pivot("id","year","v").rename_axis(columns=None).reset_index())
print(out)

   id   2016 2017 2018
0   1  S_B_O  S_O  B_O
1   2  S_B_O    S    O

Answer 2

将'sell'，'buy'和'own'设为布尔（df.iloc[:, -3:].astype(bool)），将布尔乘以字符['S', 'B', 'O']和join它们作为组合字符串。
- 0的值为False，其他数字为True
- df.iloc[:, -3:]等效于df[['sell', 'buy', 'own']]。
列表推导会删除空值，否则结果将包含多余的_（例如_S_O或__S）。
由于此解决方案使用.apply，因此比solution慢anky
我认为.astype(bool) * ['S', 'B', 'O']是一个巧妙的技巧，值得分享。

import pandas as pd

# sample dataframe
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2], 'year': [2016, 2017, 2018, 2016, 2017, 2018], 'sell': [9, 9, 0, 7, 2, 0], 'buy': [2, 0, 2, 2, 0, 0], 'own': [10, 10, 10, 11, 0, 18]})

# Cast sell, buy, own as Bool, multiple and combine as string
df['str'] = (df.iloc[:, -3:].astype(bool) * ['S', 'B', 'O']).apply(lambda x: '_'.join([v for v in x if v]), axis=1)

# display(df)
   id  year  sell  buy  own    str
0   1  2016     9    2   10  S_B_O
1   1  2017     9    0   10    S_O
2   1  2018     0    2   10    B_O
3   2  2016     7    2   11  S_B_O
4   2  2017     2    0    0      S

# pivot
dfp = df.pivot(index='id', columns='year', values='str')

# display(dfp)
year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

在空闲状态下运行

Answer 3

那是我最喜欢的熊猫操作（取消堆栈）:-) 您可以按照以下步骤进行操作。大多数工作是按照您的要求构建字符串：

df['operation']= df['sell'].map(lambda v: ['S'] if v != 0 else [])
indexer= df['buy'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('B'))
indexer= df['own'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('O'))
df['operation']= df['operation'].map(lambda l: '_'.join(l))
df.set_index(['id', 'year'], inplace=True)
df_res= df['operation'].unstack()
df_res

测试数据如下：

from io import StringIO
infile= StringIO(
""" id | year | sell | buy | own 
 1  | 2016 | 9    | 2   | 10  
 1  | 2017 | 9    | 0   | 10  
 1  | 2018 | 0    | 2   | 10  
 2  | 2016 | 7    | 2   | 11  
 2  | 2017 | 2    | 0   |  0  
 2  | 2018 | 0    | 0   | 18""")
df= pd.read_csv(infile, sep='|', dtype='int16') #.set_index('Date')
df.head()
df.columns= [col.strip() for col in df.columns]

您得到结果：

year   2016 2017 2018
id                   
1     S_B_O  S_O  B_O
2     S_B_O    S    O

熊猫的行到列转换

3 个答案:

在空闲状态下运行