我想比较两个熊猫数据框的摘要。一种想法是从两个数据帧中创建元组并查看值。但是我正在努力去做。
import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('iris').iloc[:,:-1]
df1 = df.describe().T
df2 = df.sample(50).describe().T
df1
count mean std min 25% 50% 75% max
sepal_length 150.0 5.843333 0.828066 4.3 5.1 5.80 6.4 7.9
sepal_width 150.0 3.057333 0.435866 2.0 2.8 3.00 3.3 4.4
petal_length 150.0 3.758000 1.765298 1.0 1.6 4.35 5.1 6.9
petal_width 150.0 1.199333 0.762238 0.1 0.3 1.30 1.8 2.5
df2
count mean std min 25% 50% 75% max
sepal_length 50.0 5.884 0.804924 4.4 5.100 5.85 6.475 7.9
sepal_width 50.0 3.086 0.452661 2.2 2.825 3.00 3.375 4.4
petal_length 50.0 3.842 1.761967 1.2 1.600 4.60 5.100 6.9
petal_width 50.0 1.256 0.773320 0.1 0.400 1.40 1.975 2.4
tuples like this and so on
count mean std min 25% 50% 75% max
sepal_length (50.0,150.0)
sepal_width
petal_length
petal_width tuples for all the cells.
答案 0 :(得分:1)
您可以这样做:
data = [ [( round(j,2) , round(i,2)) for i,j in zip(df1[c],df2[c])]
for c in df1.columns
]
comparisons = pd.DataFrame(data,columns=df1.index,index=df1.columns).T
comparisons
import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('iris').iloc[:,:-1]
df1 = df.describe().T
df2 = df.sample(50,random_state=100).describe().T
pd.concat([df1.rename(columns=lambda x: x+'_1'),df2],axis=1)\
[['mean_1','mean','50%_1','50%']]\
.style.highlight_min(subset=['mean_1','mean'],axis=1,color='gray')\
.highlight_min(subset=['50%_1','50%'],axis=1,color='tomato')
礼物:
count mean std min \
sepal_length (50.0, 150.0) (5.88, 5.84) (0.8, 0.83) (4.4, 4.3)
sepal_width (50.0, 150.0) (3.09, 3.06) (0.45, 0.44) (2.2, 2.0)
petal_length (50.0, 150.0) (3.84, 3.76) (1.76, 1.77) (1.2, 1.0)
petal_width (50.0, 150.0) (1.26, 1.2) (0.77, 0.76) (0.1, 0.1)
25% 50% 75% max
sepal_length (5.1, 5.1) (5.85, 5.8) (6.47, 6.4) (7.9, 7.9)
sepal_width (2.82, 2.8) (3.0, 3.0) (3.38, 3.3) (4.4, 4.4)
petal_length (1.6, 1.6) (4.6, 4.35) (5.1, 5.1) (6.9, 6.9)
petal_width (0.4, 0.3) (1.4, 1.3) (1.98, 1.8) (2.4, 2.5)