我有两个数据帧,它们具有相同的格式,我生成了下面的直方图,标记为'df'和'df2'。我有一个名为'df_merged'的第三个数据帧,它是数据帧'df'和'df2'在行数方面的组合。
我希望第三个直方图条显示'df'和'df2'的总和。我还希望将'df'和'df2'直方图标准化为组合的'df_merged'直方图,使得'df'和'df2'直方图在组合的直方图内。这可行吗?
我的图现在看起来不正确,因为在x轴上开始200,'df2'高于不合适的组合,因为它是'df2'和'df'的总和。我相信这种情况正在发生,因为我将三个直方图中的每个bin加权值总数。这可行吗?
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
df = df[['Column1']]
df2 = df2[['Column1']]
df_merged = pd.concat([df, df2], ignore_index=True)
df_weights = 100*np.ones_like(df.values) / float(len(df))
df2_weights = 100*np.ones_like(df2.values) / float(len(df2))
df_merged_weights = 100*np.ones_like(df_merged.values) / float(len(df_merged))
fig, ax = plt.subplots()
ax.hist(df.values, bins=25, weights=df_weights, color='black', histtype='step', label='df')
ax.hist(df2.values, bins=200, weights=df2_weights, color='green', histtype='step', label='df2')
ax.hist(df_merged.values, bins=200,weights=df_merged_weights,color='red', histtype='step', label='Combined')
ax.margins(0.05)
ax.set_ylim(bottom=0)
ax.set_xlim([0,1000])
p.legend(loc='upper right')
答案 0 :(得分:1)
您需要按连接数组的长度来衡量所有内容。此外,您应该保持一致的bin大小和直方图范围。
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame
np.random.seed(0)
df = DataFrame(np.random.normal(300, 100, 2000)) # Two normal distributions
df2 = DataFrame(np.random.normal(700, 100, 1500))
df_merged = pd.concat([df, df2], ignore_index=True)
# weights
df_weights = np.ones_like(df.values) / len(df_merged)
df2_weights = np.ones_like(df2.values) / len(df_merged)
df_merged_weights = np.ones_like(df_merged.values) / len(df_merged)
plt_range = (df_merged.values.min(), df_merged.values.max())
fig, ax = plt.subplots()
ax.hist(df.values, bins=100, weights=df_weights, color='black', histtype='step', label='df', range=plt_range)
ax.hist(df2.values, bins=100, weights=df2_weights, color='green', histtype='step', label='df2', range=plt_range)
ax.hist(df_merged.values, bins=100, weights=df_merged_weights, color='red', histtype='step', label='Combined', range=plt_range)
ax.margins(0.05)
ax.set_ylim(bottom=0)
ax.set_xlim([0, 1000])
plt.legend(loc='upper right')
# plt.savefig('output.png')
权重:(n,)array_like或None,可选
与x相同形状的权重数组。仅x中的每个值 将相关的权重贡献给仓数(而不是 1)。