我有一个数据框:
arr = np.array([['john','m','accountant', 1,2,3,4,5],
['sara', 'f', 'doctor',3,4,5,6,3],
['stephanie, 'f', 'photographer',1,4,3,2,1]])
columns = ['name','sex','occupation','Jan','Feb','March','March','April']
df = pd.DataFrame(arr,columns=columns)
name sex occupation Jan Feb March March April
0 john m accountant 1 2 3 4 5
1 sara f doctor 3 4 5 6 3
2 stephanie f photographer 1 4 3 2 1
我想将所有这些日期列转换为两列['date','val_on_date']。我能想到的唯一方法就是:
newArr =[]
for idx, row in df.iterrows():
for col in ['Jan','Feb','March','April','May']:
newArr.append(row[['name','sex','occupation']].values.tolist() + [col, row[col]])
newDf = pd.DataFrame(newArr, columns = 'name','sex','occupation','month','val']
name sex occupation month val
0 john m accountant Jan 1
1 john m accountant Feb 2
2 john m accountant March 3
3 john m accountant April 4
4 john m accountant May 5
5 sara f doctor Jan 3
6 sara f doctor Feb 4
7 sara f doctor March 5
8 sara f doctor April 6
9 sara f doctor May 3
10 stephanie f photographer Jan 1
11 stephanie f photographer Feb 4
12 stephanie f photographer March 3
13 stephanie f photographer April 2
14 stephanie f photographer May 1
但我觉得必须有一个更好的方法或一些简单的功能在熊猫这样做。有谁知道这样的功能或更好的方法吗?
答案 0 :(得分:3)
你可以使用融化
pd.melt(df, id_vars=['name', 'sex', 'occupation'], value_vars=['Jan', 'Feb', 'March', 'April'], var_name='Month')\
.sort_values(by = 'name')
你得到了
name sex occupation Month value
0 john m accountant Jan 1
3 john m accountant Feb 2
6 john m accountant March 3
9 john m accountant March 4
12 john m accountant April 5
1 sara f doctor Jan 3
4 sara f doctor Feb 4
7 sara f doctor March 5
10 sara f doctor March 6
13 sara f doctor April 3
2 stephanie f photographer Jan 1
5 stephanie f photographer Feb 4
8 stephanie f photographer March 3
11 stephanie f photographer March 2
14 stephanie f photographer April 1
选项2:
df.set_index(['name', 'sex', 'occupation']).stack().reset_index().\
.rename(columns = {'level_3' : 'Month', 0: 'Val'})
这将为您提供带有重置索引的输出