我有一个如下所示的df(列10_value
必须紧靠10_type
,不确定如何格式化):
0_0_type 0_0_value 0_1_type 0_1_value 0_firstname_value 0_lastname_value 10_0_type
uuid 1 ID 2 test1 test1 uuid
uuid 2 EMAIL email1 NaN NaN uuid
uuid 3 EMAIL email2 NaN NaN uuid
uuid 4 EMAIL email3 test2 test2 uuid
uuid 5 EMAIL email4 NaN NaN uuid
uuid 6 EMAIL email5 test3 test3 uuid
uuid 7 EMAIL email6 test4 test4 uuid
uuid 8 EMAIL email7 test5 test5 uuid
uuid 9 EMAIL email8 test6 test6 uuid
10_0_value 10_1_type 10_1_value 10_firstname_value 10_lastname_value
10 EMAIL email9 test7 test7
11 EMAIL email10 test8 test8
12 EMAIL email11 test9 test9
13 EMAIL email12 test10 test10
14 EMAIL email13 test11 test11
15 EMAIL email14 test12 test12
16 EMAIL email15 test13 test13
17 EMAIL email16 test14 test14
18 EMAIL email17 test15 test15
我有1500k列,具有这些类型的列名结构。唯一的不同是前面的数字在变化,在这种情况下为0
或10
。
我只需要四份数据,即uuid,email,first_name,last_name
。
我如何:
first_name
或last_name
并将记录堆叠在一起同时
EMAIL
和uuid
提取到自己的列中? 最终df应该如下所示:
uuid EMAIL first_name last_name
0 1 NaN test1 test1
1 2 email1 NaN NaN
2 3 email2 NaN NaN
3 4 email3 test2 test2
4 5 email4 NaN NaN
5 6 email5 test3 test3
6 7 email6 test4 test4
7 8 email7 test5 test5
8 9 email8 test6 test6
9 10 email9 test7 test7
10 11 email10 test8 test8
11 12 email11 test9 test9
12 13 email12 test10 test10
13 14 email13 test11 test11
14 15 email14 test12 test12
15 16 email15 test13 test13
16 17 email16 test14 test14
17 18 email17 test15 test15
答案 0 :(得分:1)
好的,让我们尝试一下这些列命名的代码。首先,重命名您的列,以将递增的索引移动到分隔符“我选择”之后。然后使用pd.wide_to_long
重塑数据框,最后使用filter
重塑数据框以仅选择“值”列。
df.columns = df.columns.str.replace(r'(\d+)_(\w+)',r'\2:\1')
pd.wide_to_long(df.reset_index(),
['0_type','1_type','0_value','1_value','firstname_value','lastname_value'],
'index',
'Num',
sep=':')\
.reset_index()\
.filter(like='value')
输出:
0_value 1_value firstname_value lastname_value
0 1 2 test1 test1
1 2 email1 NaN NaN
2 3 email2 NaN NaN
3 4 email3 test2 test2
4 5 email4 NaN NaN
5 6 email5 test3 test3
6 7 email6 test4 test4
7 8 email7 test5 test5
8 9 email8 test6 test6
9 10 email9 test7 test7
10 11 email10 test8 test8
11 12 email11 test9 test9
12 13 email12 test10 test10
13 14 email13 test11 test11
14 15 email14 test12 test12
15 16 email15 test13 test13
16 17 email16 test14 test14
17 18 email17 test15 test15