Question

我的数据框的格式为：

df2 = pd.DataFrame({'Date': np.array([2018,2017,2016,2015]),
                'Rev': np.array([4000,5000,6000,7000]),
                'Other': np.array([0,0,0,0]),
                'High':np.array([75.11,70.93,48.63,43.59]),
                'Low':np.array([60.42,45.74,34.15,33.12]),
                'Mean':np.array([67.765,58.335,41.390,39.355]) #mean of high/low columns
                })

这看起来像：

我想将此数据帧转换为类似于以下内容的内容：

基本上，您将每行复制两次。然后，您将在“价格”列下按列，分别取高，低和平均值。然后，您添加一个新的“类别”，以跟踪来自高/低/中的类别（0表示高，1表示低，2表示平均值）。

Answer 1

这是一个简单的melt（从宽到长）问题：

# convert df2 from wide to long, melting the High, Low and Mean cols
df3 = df2.melt(df2.columns.difference(['High', 'Low', 'Mean']).tolist(), 
               var_name='category', 
               value_name='price')
# remap "category" to integer
df3['category'] = pd.factorize(df['category'])[0]
# sort and display
df3.sort_values('Date', ascending=False))

    Date  Other   Rev  category   price
0   2018      0  4000         0  75.110
4   2018      0  4000         1  60.420
8   2018      0  4000         2  67.765
1   2017      0  5000         0  70.930
5   2017      0  5000         1  45.740
9   2017      0  5000         2  58.335
2   2016      0  6000         0  48.630
6   2016      0  6000         1  34.150
10  2016      0  6000         2  41.390
3   2015      0  7000         0  43.590
7   2015      0  7000         1  33.120
11  2015      0  7000         2  39.355

Answer 2

您可以使用melt代替stack，这样可以节省sort_values：

new_df = (df2.set_index(['Date','Rev', 'Other'])
             .stack()
             .to_frame(name='price')
             .reset_index()
         )

输出：

    Date   Rev  Other level_3   price
0   2018  4000      0    High  75.110
1   2018  4000      0     Low  60.420
2   2018  4000      0    Mean  67.765
3   2017  5000      0    High  70.930
4   2017  5000      0     Low  45.740
5   2017  5000      0    Mean  58.335
6   2016  6000      0    High  48.630
7   2016  6000      0     Low  34.150
8   2016  6000      0    Mean  41.390
9   2015  7000      0    High  43.590
10  2015  7000      0     Low  33.120
11  2015  7000      0    Mean  39.355

，如果需要category列：

new_df['category'] = new_df['level_3'].map({'High':0, 'Low':1, 'Mean':2'})

Answer 3

这是另一个版本：

import pandas as pd
import numpy as np

df2 = pd.DataFrame({'Date': np.array([2018,2017,2016,2015]),
                'Rev': np.array([4000,5000,6000,7000]),
                'Other': np.array([0,0,0,0]),
                'High':np.array([75.11,70.93,48.63,43.59]),
                'Low':np.array([60.42,45.74,34.15,33.12]),
                'Mean':np.array([67.765,58.335,41.390,39.355]) #mean of high/low columns
                })

#create one dataframe per category
df_high = df2[['Date', 'Other', 'Rev', 'High']]
df_mean = df2[['Date', 'Other', 'Rev', 'Mean']]
df_low = df2[['Date', 'Other', 'Rev', 'Low']]

#rename the category column to price
df_high = df_high.rename(index = str, columns = {'High': 'price'})
df_mean = df_mean.rename(index = str, columns = {'Mean': 'price'})
df_low = df_low.rename(index = str, columns = {'Low': 'price'})

#create new category column
df_high['category'] = 0
df_mean['category'] = 2
df_low['category'] = 1

#concatenate the dataframes together
frames = [df_high, df_mean, df_low]
df_concat = pd.concat(frames)

#sort values per example
df_concat = df_concat.sort_values(by = ['Date', 'category'], ascending = [False, True])

#print result
print(df_concat)

结果：

   Date  Other   Rev   price  category
0  2018      0  4000  75.110         0
0  2018      0  4000  60.420         1
0  2018      0  4000  67.765         2
1  2017      0  5000  70.930         0
1  2017      0  5000  45.740         1
1  2017      0  5000  58.335         2
2  2016      0  6000  48.630         0
2  2016      0  6000  34.150         1
2  2016      0  6000  41.390         2
3  2015      0  7000  43.590         0
3  2015      0  7000  33.120         1
3  2015      0  7000  39.355         2

如何复制数据框中的条目

3 个答案: