我有一个2000列以上的pandas数据框。所有列均具有数值。我想找到每列的最小值和最大值之间的差异。然后,我想过滤出差异最大的前十列。
Col1 Col2 Col3 ..... Col2500
4 1 3 ..... 6
7 5 10 ..... 17
1 22 4 ..... 2
我尝试了几种选择,但是没有一个可行! 请提出解决方案。
答案 0 :(得分:1)
这将为您提供2019-10-24T14:11:44.162+01:00 [CELL/0] [ERR] Timed out after 1m0s: health check never passed.
2019-10-24T14:11:44.162+01:00 [HEALTH/0] [ERR] Failed to make TCP connection to port 8080: connection refused
2019-10-24T14:11:44.168+01:00 [CELL/SSHD/0] [OUT] Exit status 0
2019-10-24T14:11:44.368+01:00 [APP/PROC/WEB/0] [OUT] Exit status 143
中的结果:
Series
示例:
df.T.apply(lambda x: x.max() - x.min(), axis=1).nlargest(10)
或者只是:
df
Col1 Col2 Col3 Col2500
0 4 1 3 6
1 7 5 10 17
2 1 22 4 2
df.T.apply(lambda x: x.max() - x.min(), axis=1).nlargest(3)
Col2 21
Col2500 15
Col3 7
dtype: int64
答案 1 :(得分:0)
这是我的解决方法
>>> data = {'Col1':[4,7,1],'Col2':[1,5,22], 'Col3':[3,10,4], 'Col2500':[6,17,2]}
>>> df = pd.DataFrame(data)
>>> df
Col1 Col2 Col3 Col2500
0 4 1 3 6
1 7 5 10 17
2 1 22 4 2
>>> diff = df.max() - df.min()
>>> diff
Col1 6
Col2 21
Col3 7
Col2500 15
>>> pd.DataFrame(diff).sort_values(by=0, ascending=False)
0
Col2 21
Col2500 15
Col3 7
Col1 6
答案 2 :(得分:0)
希望这会有所帮助!
diff = df.max() - df.min()
diff.sort_values()
示例:
>>> df.values
array([[ 0, 12, 42],
[ 1, 13, 21],
[ 12, 1, 30],
[ 3, 45, -39],
[ 4, 1, 38]])
>>> diff = df.max() - df.min()
>>>
>>> diff.sort_values(ascending=False)
T3 81
T2 44
T1 12
dtype: int64
>>> diff.sort_values()
T1 12
T2 44
T3 81
dtype: int64
>>>
答案 3 :(得分:-1)
import pandas as pd
import numpy as np
#sample data
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
#transposing data so columns are now rows and column names are indices
df = df.transpose()
#Calculation of Max - Min per row
df['dif'] = df.max(axis=1) - df.min(axis = 1)
#Number of results at the end (10 in your case)
TOP_N = 2
#Resetting the index to get column names and sorting by difference high to low
result = df.reset_index().rename(columns={'index':'ColumnName'})[['ColumnName','dif']].sort_values(by=['dif'],ascending=[False]).head(TOP_N)
print(result)