UPD :在页面底部粘贴了一个数据框作为示例。
我的原始xls文件如下:
我需要执行两个操作才能使其看起来如下所示:
首先,我需要使用在其上方的单元格中显示的值来填充空行值。这可以通过以下功能实现:
def get_csv():
#Read csv file
df = pd.read_excel('test.xls')
df = df.fillna(method='ffill')
return df
第二,我将stack
与set_index
一起使用:
df = (df.set_index(['Country', 'Gender', 'Arr-Dep'])
.stack()
.reset_index(name='Value')
.rename(columns={'level_3':'Year'}))
,我想知道是否有更简单的方法。是否有一个将数据框,Excel等转换为所需格式的库?
Excel导入后的原始数据框:
Country Gender Direction 1974 1975 1976
0 Austria Male IN 13728 8754 9695
1 NaN NaN OUT 17977 12271 9899
2 NaN Female IN 8541 6465 6447
3 NaN NaN OUT 8450 7190 6288
4 NaN Total IN 22269 15219 16142
5 NaN NaN OUT 26427 19461 16187
6 Belgium Male IN 2412 2245 2296
7 NaN NaN OUT 2800 2490 2413
8 NaN Female IN 2105 2022 2057
9 NaN NaN OUT 2100 2113 2004
10 NaN Total IN 4517 4267 4353
11 NaN NaN OUT 4900 4603 4417
答案 0 :(得分:2)
替代解决方案是使用melt
,但是如果需要像stacked
DataFrame
这样的列进行相同的排序,则需要添加sort_values
:
df1 = (df.ffill()
.melt(id_vars=['Country','Gender','Direction'],var_name="Date",value_name='Value')
)
print (df1)
Country Gender Direction Date Value
0 Austria Male IN 1974 13728
1 Austria Male OUT 1974 17977
2 Austria Female IN 1974 8541
3 Austria Female OUT 1974 8450
4 Austria Total IN 1974 22269
5 Austria Total OUT 1974 26427
6 Belgium Male IN 1974 2412
7 Belgium Male OUT 1974 2800
8 Belgium Female IN 1974 2105
9 Belgium Female OUT 1974 2100
10 Belgium Total IN 1974 4517
11 Belgium Total OUT 1974 4900
12 Austria Male IN 1975 8754
13 Austria Male OUT 1975 12271
14 Austria Female IN 1975 6465
15 Austria Female OUT 1975 7190
16 Austria Total IN 1975 15219
17 Austria Total OUT 1975 19461
18 Belgium Male IN 1975 2245
19 Belgium Male OUT 1975 2490
20 Belgium Female IN 1975 2022
21 Belgium Female OUT 1975 2113
22 Belgium Total IN 1975 4267
23 Belgium Total OUT 1975 4603
24 Austria Male IN 1976 9695
25 Austria Male OUT 1976 9899
26 Austria Female IN 1976 6447
27 Austria Female OUT 1976 6288
28 Austria Total IN 1976 16142
29 Austria Total OUT 1976 16187
30 Belgium Male IN 1976 2296
...
...
df1 = (df.ffill()
.melt(id_vars=['Country','Gender','Direction'],var_name="Date", value_name='Value')
.sort_values(['Country', 'Gender','Direction'])
.reset_index(drop=True))
print (df1)
Country Gender Direction Date Value
0 Austria Female IN 1974 8541
1 Austria Female IN 1975 6465
2 Austria Female IN 1976 6447
3 Austria Female OUT 1974 8450
4 Austria Female OUT 1975 7190
5 Austria Female OUT 1976 6288
6 Austria Male IN 1974 13728
7 Austria Male IN 1975 8754
8 Austria Male IN 1976 9695
9 Austria Male OUT 1974 17977
10 Austria Male OUT 1975 12271
11 Austria Male OUT 1976 9899
12 Austria Total IN 1974 22269
13 Austria Total IN 1975 15219
14 Austria Total IN 1976 16142
15 Austria Total OUT 1974 26427
16 Austria Total OUT 1975 19461
17 Austria Total OUT 1976 16187
18 Belgium Female IN 1974 2105
19 Belgium Female IN 1975 2022
20 Belgium Female IN 1976 2057
21 Belgium Female OUT 1974 2100
22 Belgium Female OUT 1975 2113
23 Belgium Female OUT 1976 2004
24 Belgium Male IN 1974 2412
25 Belgium Male IN 1975 2245
26 Belgium Male IN 1976 2296
27 Belgium Male OUT 1974 2800
28 Belgium Male OUT 1975 2490
29 Belgium Male OUT 1976 2413
30 Belgium Total IN 1974 4517
...
...
答案 1 :(得分:2)
stack
我喜欢你的方法。我会通过几种方式进行更改。
ffill
df.ffill().set_index(
['Country', 'Gender', 'Direction']
).rename_axis('Year', 1).stack().reset_index(name='Value')
Country Gender Direction Year Value
0 Austria Male IN 1974 13728
1 Austria Male IN 1975 8754
2 Austria Male IN 1976 9695
3 Austria Male OUT 1974 17977
4 Austria Male OUT 1975 12271
5 Austria Male OUT 1976 9899
...
我想提出一种自定义方法。这应该很快。
def cstm_ffill(s):
i = np.flatnonzero(s.notna())
i = np.concatenate([[0], i, [len(s)]])
d = np.diff(i)
a = s.values[i[:-1].repeat(d)]
return a
def cstm_melt(df):
c = cstm_ffill(df.Country)
g = cstm_ffill(df.Gender)
d = cstm_ffill(df.Direction)
y = df.columns[3:].values
k = len(y)
i = np.column_stack([c, g, d])
v = np.column_stack([*map(df.get, y)]).ravel()
df_ = pd.DataFrame(
np.column_stack([i.repeat(k, axis=0), v, np.tile(y, len(i))]),
columns=['Country', 'Gender', 'Direction', 'Year', 'Value']
)
return df_
cstm_melt(df)
Country Gender Direction Year Value
0 Austria Male IN 13728 1974
1 Austria Male IN 8754 1975
2 Austria Male IN 9695 1976
3 Austria Male OUT 17977 1974
4 Austria Male OUT 12271 1975
5 Austria Male OUT 9899 1976
...