大家好,所以我的数据框看起来像这样:
A | B | C | D | E
'USD'
'trading expenses-total'
8.10 2.3 5.5
9.1 1.4 6.1
5.4 5.1 7.8
我还没有发现任何类似的内容,如果这是重复的话,请道歉。但基本上我试图找到包含字符串' total'的列。 (列B)及其相邻列(C和D)并将它们转换为数据帧。我觉得我很接近以下代码:
test.loc[:,test.columns.str.contains('total')]
隔离了正确的列,但我无法弄清楚如何抓住相邻的两列。我想要的输出是:
B | C | D
'USD'
'trading expenses-total'
8.10 2.3 5.5
9.1 1.4 6.1
5.4 5.1 7.8
答案 0 :(得分:3)
这是一种方法 -
from scipy.ndimage.morphology import binary_dilation as bind
mask = test.columns.str.contains('total')
test_out = test.iloc[:,bind(mask,[1,1,1],origin=-1)]
如果您无法访问SciPy
,也可以使用np.convolve
,就像这样 -
test_out = test.iloc[:,np.convolve(mask,[1,1,1])[:-2]>0]
示例运行
案例#1:
In [390]: np.random.seed(1234)
In [391]: test = pd.DataFrame(np.random.randint(0,9,(3,5)))
In [392]: test.columns = [['P','total001','g','r','t']]
In [393]: test
Out[393]:
P total001 g r t
0 3 6 5 4 8
1 1 7 6 8 0
2 5 0 6 2 0
In [394]: mask = test.columns.str.contains('total')
In [395]: test.iloc[:,bind(mask,[1,1,1],origin=-1)]
Out[395]:
total001 g r
0 6 5 4
1 7 6 8
2 0 6 2
案例#2:
如果您有多个匹配的列,并且如果您超出限制且匹配列的右侧没有两列,则此功能也有效 -
In [401]: np.random.seed(1234)
In [402]: test = pd.DataFrame(np.random.randint(0,9,(3,7)))
In [403]: test.columns = [['P','total001','g','r','t','total002','k']]
In [406]: test
Out[406]:
P total001 g r t total002 k
0 3 6 5 4 8 1 7
1 6 8 0 5 0 6 2
2 0 5 2 6 3 7 0
In [407]: mask = test.columns.str.contains('total')
In [408]: test.iloc[:,bind(mask,[1,1,1],origin=-1)]
Out[408]:
total001 g r total002 k
0 6 5 4 1 7
1 8 0 5 6 2
2 5 2 6 7 0
答案 1 :(得分:3)
OLD回答:
熊猫方法:
In [36]: df = pd.DataFrame(np.random.rand(3,5), columns=['A','total','C','D','E'])
In [37]: df
Out[37]:
A total C D E
0 0.789482 0.427260 0.169065 0.112993 0.142648
1 0.303391 0.484157 0.454579 0.410785 0.827571
2 0.984273 0.001532 0.676777 0.026324 0.094534
In [38]: idx = np.argmax(df.columns.str.contains('total'))
In [39]: df.iloc[:, idx:idx+3]
Out[39]:
total C D
0 0.427260 0.169065 0.112993
1 0.484157 0.454579 0.410785
2 0.001532 0.676777 0.026324
<强>更新强>
In [118]: df
Out[118]:
A B C D E
0 NaN USD NaN NaN NaN
1 NaN trading expenses-total NaN NaN NaN
2 A 8.10 2.3 5.5 10.0
3 B 9.1 1.4 6.1 11.0
4 C 5.4 5.1 7.8 12.0
In [119]: col = df.select_dtypes(['object']).apply(lambda x: x.str.contains('total').any()).idxmax()
In [120]: cols = df.columns.to_series().loc[col:].head(3).tolist()
In [121]: col
Out[121]: 'B'
In [122]: cols
Out[122]: ['B', 'C', 'D']
In [123]: df[cols]
Out[123]:
B C D
0 USD NaN NaN
1 trading expenses-total NaN NaN
2 8.10 2.3 5.5
3 9.1 1.4 6.1
4 5.4 5.1 7.8