我说(有很多列的微小数据子集)
import pandas as pd
import numpy as np
df = pd.DataFrame({'A (quarterly) 2010': np.random.rand(3),
'A (quarterly) 2011': np.random.rand(3),
'B (quarterly) 2010': np.random.rand(3),
'B (quarterly) 2011': np.random.rand(3),
'X' : np.random.randint(3, size=3)})
#Out[11]:
# A (quarterly) 2010 A (quarterly) 2011 B (quarterly) 2010 \
#0 0.868228 0.300513 0.658819
#1 0.383907 0.496740 0.347421
#2 0.284787 0.795499 0.856398
# B (quarterly) 2011 X
#0 0.374479 1
#1 0.812860 0
#2 0.604731 2
我想在列名中提取与特定模式匹配的唯一匹配项f.ex [A-B] \(.*\)\s
。
我能做到,但看起来很毛茸茸:
stubs = set([match[0] for match in df.columns.str.findall('[A-B] \(.*\) ').values if match != [] ])
list(stubs)
#['B (quarterly) ', 'A (quarterly) ']
有更简单的方法吗?