我有以下数据框:
import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
'XX_111_S5_R12_001_Mobile_05':[-14,-90,-90,-96,-91],
'YY_222_S00_R12_001_1-999_13':[-103,0,-110,-114,-114],
'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],
})
df.set_index('id',inplace=True)
df
看起来像这样:
Out[6]:
XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13
id
a -14 -103 1.0
b -90 0 2.3
c -90 -110 3.0
d -96 -114 5.0
e -91 -114 6.0
我想要做的是根据以下正则表达式对列进行分组:
\w+_\w+_\w+_\d+_([\w\d-]+)_\d+
最后,它按Mobile
和1-999
分组。
这样做的方法是什么。我尝试了这个,但未能将它们分组:
import re
grouped = df.groupby(lambda x: re.search("\w+_\w+_\w+_\d+_([\w\d-]+)_\d+", x).group(), axis=1)
for name, group in grouped:
print name
print group
打印哪些:
XX_111_S5_R12_001_Mobile_05
YY_222_S00_R12_001_1-999_13
ZZ_111_S00_R12_001_1-999_13
我们想要的是name
打印到:
Mobile
1-999
1-999
group
打印相应的数据框。
答案 0 :(得分:6)
您可以在列上使用.str.extract
,以groupby
为# Performing the groupby.
pat = '\w+_\w+_\w+_\d+_([\w\d-]+)_\d+'
grouped = df.groupby(df.columns.str.extract(pat, expand=False), axis=1)
# Showing group information.
for name, group in grouped:
print name
print group, '\n'
:
1-999
YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13
id
a -103 1.0
b 0 2.3
c -110 3.0
d -114 5.0
e -114 6.0
Mobile
XX_111_S5_R12_001_Mobile_05
id
a -14
b -90
c -90
d -96
e -91
返回预期的组:
gcc main.c -o main
.main.c:10:24: warning: format specifies type 'long double' but the argument has type 'double' [-Wformat]
printf("ld %#.9Lf\n", pow(2.71828182846L, 3.14159265359L));
~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%#.9f
1 warning generated.
答案 1 :(得分:1)
分组后,将新数据框的索引设置为[re.findall(r'\w+_\w+_\w+_\d+_([\w\d-]+)_\d+', col)[0] for col in df.columns]
(['Mobile', '1-999', '1-999']
)。
答案 2 :(得分:1)