Question

我有一个数据框：

arr = np.array([['john','m','accountant', 1,2,3,4,5],
          ['sara', 'f', 'doctor',3,4,5,6,3],
          ['stephanie, 'f', 'photographer',1,4,3,2,1]])
columns = ['name','sex','occupation','Jan','Feb','March','March','April']
df = pd.DataFrame(arr,columns=columns)


        name sex    occupation Jan Feb March March April
0       john   m    accountant   1   2     3     4     5
1       sara   f        doctor   3   4     5     6     3
2  stephanie   f  photographer   1   4     3     2     1

我想将所有这些日期列转换为两列['date'，'val_on_date']。我能想到的唯一方法就是：

newArr =[]
for idx, row in df.iterrows():
    for col in ['Jan','Feb','March','April','May']:
    newArr.append(row[['name','sex','occupation']].values.tolist() + [col, row[col]])

newDf = pd.DataFrame(newArr, columns = 'name','sex','occupation','month','val']

         name sex    occupation  month val
0        john   m    accountant    Jan   1
1        john   m    accountant    Feb   2
2        john   m    accountant  March   3
3        john   m    accountant  April   4
4        john   m    accountant    May   5
5        sara   f        doctor    Jan   3
6        sara   f        doctor    Feb   4
7        sara   f        doctor  March   5
8        sara   f        doctor  April   6
9        sara   f        doctor    May   3
10  stephanie   f  photographer    Jan   1
11  stephanie   f  photographer    Feb   4
12  stephanie   f  photographer  March   3
13  stephanie   f  photographer  April   2
14  stephanie   f  photographer    May   1

但我觉得必须有一个更好的方法或一些简单的功能在熊猫这样做。有谁知道这样的功能或更好的方法吗？

Answer 1

你可以使用融化

pd.melt(df, id_vars=['name', 'sex', 'occupation'], value_vars=['Jan', 'Feb', 'March', 'April'], var_name='Month')\
.sort_values(by = 'name')

你得到了

    name        sex occupation      Month   value
0   john        m   accountant      Jan     1
3   john        m   accountant      Feb     2
6   john        m   accountant      March   3
9   john        m   accountant      March   4
12  john        m   accountant      April   5
1   sara        f   doctor          Jan     3
4   sara        f   doctor          Feb     4
7   sara        f   doctor          March   5
10  sara        f   doctor          March   6
13  sara        f   doctor          April   3
2   stephanie   f   photographer    Jan     1
5   stephanie   f   photographer    Feb     4
8   stephanie   f   photographer    March   3
11  stephanie   f   photographer    March   2
14  stephanie   f   photographer    April   1

选项2：

df.set_index(['name', 'sex', 'occupation']).stack().reset_index().\
.rename(columns = {'level_3' : 'Month', 0: 'Val'})

这将为您提供带有重置索引的输出

Pandas将单个日期列移动到单个日期列

1 个答案: