我的数据框的格式为:
df2 = pd.DataFrame({'Date': np.array([2018,2017,2016,2015]),
'Rev': np.array([4000,5000,6000,7000]),
'Other': np.array([0,0,0,0]),
'High':np.array([75.11,70.93,48.63,43.59]),
'Low':np.array([60.42,45.74,34.15,33.12]),
'Mean':np.array([67.765,58.335,41.390,39.355]) #mean of high/low columns
})
这看起来像:
我想将此数据帧转换为类似于以下内容的内容:
基本上,您将每行复制两次。然后,您将在“价格”列下按列,分别取高,低和平均值。然后,您添加一个新的“类别”,以跟踪来自高/低/中的类别(0表示高,1表示低,2表示平均值)。
答案 0 :(得分:1)
这是一个简单的melt
(从宽到长)问题:
# convert df2 from wide to long, melting the High, Low and Mean cols
df3 = df2.melt(df2.columns.difference(['High', 'Low', 'Mean']).tolist(),
var_name='category',
value_name='price')
# remap "category" to integer
df3['category'] = pd.factorize(df['category'])[0]
# sort and display
df3.sort_values('Date', ascending=False))
Date Other Rev category price
0 2018 0 4000 0 75.110
4 2018 0 4000 1 60.420
8 2018 0 4000 2 67.765
1 2017 0 5000 0 70.930
5 2017 0 5000 1 45.740
9 2017 0 5000 2 58.335
2 2016 0 6000 0 48.630
6 2016 0 6000 1 34.150
10 2016 0 6000 2 41.390
3 2015 0 7000 0 43.590
7 2015 0 7000 1 33.120
11 2015 0 7000 2 39.355
答案 1 :(得分:0)
您可以使用melt
代替stack
,这样可以节省sort_values
:
new_df = (df2.set_index(['Date','Rev', 'Other'])
.stack()
.to_frame(name='price')
.reset_index()
)
输出:
Date Rev Other level_3 price
0 2018 4000 0 High 75.110
1 2018 4000 0 Low 60.420
2 2018 4000 0 Mean 67.765
3 2017 5000 0 High 70.930
4 2017 5000 0 Low 45.740
5 2017 5000 0 Mean 58.335
6 2016 6000 0 High 48.630
7 2016 6000 0 Low 34.150
8 2016 6000 0 Mean 41.390
9 2015 7000 0 High 43.590
10 2015 7000 0 Low 33.120
11 2015 7000 0 Mean 39.355
,如果需要category
列:
new_df['category'] = new_df['level_3'].map({'High':0, 'Low':1, 'Mean':2'})
答案 2 :(得分:0)
这是另一个版本:
import pandas as pd
import numpy as np
df2 = pd.DataFrame({'Date': np.array([2018,2017,2016,2015]),
'Rev': np.array([4000,5000,6000,7000]),
'Other': np.array([0,0,0,0]),
'High':np.array([75.11,70.93,48.63,43.59]),
'Low':np.array([60.42,45.74,34.15,33.12]),
'Mean':np.array([67.765,58.335,41.390,39.355]) #mean of high/low columns
})
#create one dataframe per category
df_high = df2[['Date', 'Other', 'Rev', 'High']]
df_mean = df2[['Date', 'Other', 'Rev', 'Mean']]
df_low = df2[['Date', 'Other', 'Rev', 'Low']]
#rename the category column to price
df_high = df_high.rename(index = str, columns = {'High': 'price'})
df_mean = df_mean.rename(index = str, columns = {'Mean': 'price'})
df_low = df_low.rename(index = str, columns = {'Low': 'price'})
#create new category column
df_high['category'] = 0
df_mean['category'] = 2
df_low['category'] = 1
#concatenate the dataframes together
frames = [df_high, df_mean, df_low]
df_concat = pd.concat(frames)
#sort values per example
df_concat = df_concat.sort_values(by = ['Date', 'category'], ascending = [False, True])
#print result
print(df_concat)
结果:
Date Other Rev price category
0 2018 0 4000 75.110 0
0 2018 0 4000 60.420 1
0 2018 0 4000 67.765 2
1 2017 0 5000 70.930 0
1 2017 0 5000 45.740 1
1 2017 0 5000 58.335 2
2 2016 0 6000 48.630 0
2 2016 0 6000 34.150 1
2 2016 0 6000 41.390 2
3 2015 0 7000 43.590 0
3 2015 0 7000 33.120 1
3 2015 0 7000 39.355 2