我的数据框如下
+----+------+------+-----+-----+
| id | year | sell | buy | own |
+----+------+------+-----+-----+
| 1 | 2016 | 9 | 2 | 10 |
| 1 | 2017 | 9 | 0 | 10 |
| 1 | 2018 | 0 | 2 | 10 |
| 2 | 2016 | 7 | 2 | 11 |
| 2 | 2017 | 2 | 0 | 0 |
| 2 | 2018 | 0 | 0 | 18 |
+----+------+------+-----+-----+
我正在尝试将行转置为列,但我希望保留一些字母(如果不是0
(S出售,B买入,O拥有),而不是汇总值。如果特定年份的所有列都有值,那么我需要该年份的S_B_O。如果仅存在用于买卖的值,则包含S_B等,因此预期输出为
+----+-------+------+------+
| ID | 2016 | 2017 | 2018 |
+----+-------+------+------+
| 1 | S_B_O | S_O | B_O |
+----+-------+------+------+
| 2 | S_B_O | S | O |
+----+-------+------+------+
我是python的新手,不知道我们如何做到这一点。我只知道聚合的基本原理,如下所示。可能吗?任何建议将不胜感激。
import pandas as pd
import numpy as np
df=pd.read_excel('Pivot.xlsx')
pivot = pd.pivot_table(df,index=["ID"],columns='year',values ='sell' ,aggfunc = np.sum,fill_value=0)
数据框
id,year,sell,buy,own
1,2016,9,2,10
1,2017,9,0,10
1,2018,0,2,10
2,2016,7,2,11
2,2017,2,0,0
2,2018,0,0,18
答案 0 :(得分:7)
您可以在此处将df.dot
与df.pivot
一起使用:
u = df[['sell','buy','own']]
(df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v"))
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
具有完整格式;
u = df[['sell','buy','own']]
out = (df.assign(v=u.ne(0).dot(u.columns.str[0].str.upper()+'_').str[:-1])
.pivot("id","year","v").rename_axis(columns=None).reset_index())
print(out)
id 2016 2017 2018
0 1 S_B_O S_O B_O
1 2 S_B_O S O
答案 1 :(得分:4)
'sell'
,'buy'
和'own'
设为布尔(df.iloc[:, -3:].astype(bool)
),将布尔乘以字符['S', 'B', 'O']
和join
它们作为组合字符串。
0
的值为False
,其他数字为True
df.iloc[:, -3:]
等效于df[['sell', 'buy', 'own']]
。_
(例如_S_O
或__S
)。.apply
,因此比solution慢anky .astype(bool) * ['S', 'B', 'O']
是一个巧妙的技巧,值得分享。import pandas as pd
# sample dataframe
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2], 'year': [2016, 2017, 2018, 2016, 2017, 2018], 'sell': [9, 9, 0, 7, 2, 0], 'buy': [2, 0, 2, 2, 0, 0], 'own': [10, 10, 10, 11, 0, 18]})
# Cast sell, buy, own as Bool, multiple and combine as string
df['str'] = (df.iloc[:, -3:].astype(bool) * ['S', 'B', 'O']).apply(lambda x: '_'.join([v for v in x if v]), axis=1)
# display(df)
id year sell buy own str
0 1 2016 9 2 10 S_B_O
1 1 2017 9 0 10 S_O
2 1 2018 0 2 10 B_O
3 2 2016 7 2 11 S_B_O
4 2 2017 2 0 0 S
# pivot
dfp = df.pivot(index='id', columns='year', values='str')
# display(dfp)
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O
答案 2 :(得分:1)
那是我最喜欢的熊猫操作(取消堆栈):-) 您可以按照以下步骤进行操作。大多数工作是按照您的要求构建字符串:
df['operation']= df['sell'].map(lambda v: ['S'] if v != 0 else [])
indexer= df['buy'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('B'))
indexer= df['own'] != 0
df.loc[indexer, 'operation'].map(lambda v: v.append('O'))
df['operation']= df['operation'].map(lambda l: '_'.join(l))
df.set_index(['id', 'year'], inplace=True)
df_res= df['operation'].unstack()
df_res
测试数据如下:
from io import StringIO
infile= StringIO(
""" id | year | sell | buy | own
1 | 2016 | 9 | 2 | 10
1 | 2017 | 9 | 0 | 10
1 | 2018 | 0 | 2 | 10
2 | 2016 | 7 | 2 | 11
2 | 2017 | 2 | 0 | 0
2 | 2018 | 0 | 0 | 18""")
df= pd.read_csv(infile, sep='|', dtype='int16') #.set_index('Date')
df.head()
df.columns= [col.strip() for col in df.columns]
您得到结果:
year 2016 2017 2018
id
1 S_B_O S_O B_O
2 S_B_O S O