尝试重新组织以下数据框,以便 [VAR]
1-3 沿 [GROUP]
列按数字顺序合并
VAR 1 VAR 2 VAR 3 GROUP
3 [0-10]
1 3 [0-10]
1 3 [0-10]
1 2 [0-10]
2 [0-10]
3 3 [10-20]
3 1 [10-20]
1 [10-20]
2 [10-20]
2 [10-20]
2 [10-20]
试图将其作为最终结果:
VAR_MERGED GROUP
1 [0-10]
1 [0-10]
1 [0-10]
2 [0-10]
2 [0-10]
3 [0-10]
3 [0-10]
3 [0-10]
1 [10-20]
1 [10-20]
2 [10-20]
2 [10-20]
2 [10-20]
3 [10-20]
3 [10-20]
3 [10-20]
我尝试使用 df['VAR_MERGED'] = df[['VAR 1' 'VAR 2' 'VAR 3']].agg('-'.join, axis=1)
但得到关于预期 str 的错误,但 VAR
列中的值都是浮点数但不确定为什么这需要字符串值?
答案 0 :(得分:1)
输入数据:
>>> df
VAR 1 VAR 2 VAR 3 GROUP ANOTHER
0 NaN NaN 3.0 [0-10] another
1 1.0 NaN 3.0 [0-10] another
2 1.0 NaN 3.0 [0-10] another
3 1.0 2.0 NaN [0-10] another
4 NaN 2.0 NaN [0-10] another
5 3.0 NaN 3.0 [10-20] another
6 3.0 1.0 NaN [10-20] another
7 NaN 1.0 NaN [10-20] another
8 NaN 2.0 NaN [10-20] another
9 NaN NaN 2.0 [10-20] another
10 NaN NaN 2.0 [10-20] another
您可以使用 melt
。要完全理解,可以逐行执行代码(df.melt(...)
、df.melt(...).dropna()
、df.melt(...).dropna.sort_values(...)
等):
id_vars = df.columns[~df.columns.str.startswith('VAR')]
out = df.melt(id_vars, value_name='VAR_MERGED') \
.dropna() \
.sort_values(['GROUP', 'VAR_MERGED']) \
.reset_index(drop=True) \
[['VAR_MERGED'] + id_vars]
结果输出:
>>> out
VAR_MERGED GROUP ANOTHER
0 1.0 [0-10] another
1 1.0 [0-10] another
2 1.0 [0-10] another
3 2.0 [0-10] another
4 2.0 [0-10] another
5 3.0 [0-10] another
6 3.0 [0-10] another
7 3.0 [0-10] another
8 1.0 [10-20] another
9 1.0 [10-20] another
10 2.0 [10-20] another
11 2.0 [10-20] another
12 2.0 [10-20] another
13 3.0 [10-20] another
14 3.0 [10-20] another
15 3.0 [10-20] another
答案 1 :(得分:1)
通过 set_index
/ stack
的一种方式:
df = (
df.set_index('GROUP')
.stack()
.sort_index(level=[0,1])
.reset_index(-1,drop=True)
.reset_index(name = 'Var Merged').iloc[:, ::-1]
)