比方说,我有一个包含所有pandas df列名称的列表。如何检查任何一对列之间的差异是否大于3?
伪代码
IF difference between df['T01'] and df['T02'] > 3 or difference between df['T03'] and df['T04'] > 3 or difference between df['T05'] and df['T06'] > 3 and so on... THEN
DO SOMETHING
代码
df_column_names = ['T01', 'T02', 'T03', 'T04', 'T05', 'T06', 'T07', 'T08', 'T09', 'T10', 'T11', 'T12', 'T13', 'T14', 'T15', 'T16', 'T17', 'T18', 'T19', 'T20', 'T21', 'T22', 'T23', 'T24', 'T25', 'T26', 'T27', 'T28', 'T29', T30', 'T31', 'T32']
df
| T01 | T02 | T03 | T04 | ... |
|-----|--------|-----|-------|-----|
| 0.1 | 0.5685 | 1.4 | 0.333 | ... |
答案 0 :(得分:2)
如果列对数通过索引选择并减去:
df1 = df.iloc[:, ::2] - df.iloc[:, 1::2].values
使用DataFrameGroupBy.diff
的常规解决方案:
c = np.arange(len(df.columns)) // 2
df1 = df.groupby(c, axis=1).diff(axis=1).dropna(axis=1, how='all')
编辑:
如果需要按列表中的列名进行选择:
df1 = df[df_column_names].iloc[:, ::2] - df[df_column_names].iloc[:, 1::2].values
df = df[df_column_names]
c = np.arange(len(df.columns)) // 2
df1 = df.groupby(c, axis=1).diff(axis=1).dropna(axis=1, how='all')
示例:
df = pd.DataFrame({
'A':list('abcdef'),
'T04':[4,5,4,5,5,4],
'T03':[7,8,9,4,2,3],
'T02':[1,3,5,7,1,0],
'T01':[5,3,6,9,2,4],
'F':list('aaabbb')
})
df_column_names = ['T01', 'T02', 'T03', 'T04']
df1 = df[df_column_names].iloc[:, ::2] - df[df_column_names].iloc[:, 1::2].values
print (df1)
T01 T03
0 4 3
1 0 3
2 1 5
3 2 -1
4 1 -3
5 4 -1
mask = df1 > 3
print (mask)
T01 T03
0 True False
1 False False
2 False True
3 False False
4 False False
5 True False