如何绘制两个变量的分布?

时间:2019-11-27 11:35:44

标签: python-3.x matplotlib seaborn distribution

我想绘制两个变量的域[-1,0,1]的分布。我以为我可以在横坐标轴和纵坐标轴上制作箱形图,或者分配功能。我想要类似的东西:

introducir la descripción de la imagen aquí

但这是一种内核平滑技术,用于获得我发现的here的2D空间的概率密度函数(PDF)。因此,我也会很满意:

introducir la descripción de la imagen aquí

直到现在我有了:

def plot_mean(columns_x, columns_y):
    try:
        f, ax = plt.subplots(figsize=(6, 6))
        plt.axis([-1, 1, -1, 1])     
        plt.grid(True) 
        plt.xlabel(columns_x)
        plt.ylabel(columns_y)
        # We get all parties from df_parties_means
        for party in df_parties_means['Party']:
                # we get the probability distribution function 
                party_x = df_parties_means.loc[
                    ((df_parties_means['Question'] == columns_x) & (df_parties_means['Party'] == party)), 'Mean']
                party_y = df_parties_means.loc[
                    ((df_parties_means['Question'] == columns_y) & (df_parties_means['Party'] == party)), 'Mean']
                # we plot the party related to the questions
                plt.scatter(party_x.values[0], party_y.values[0],
                        alpha=0.4, edgecolors='w',label = party)
                plt.text(party_x.values[0], party_y.values[0], party, fontsize=10)
        # We plot the people preferences
        plt.scatter(df_features[columns_x].mean( skipna = True), df_features[columns_y].mean( skipna = True),
            alpha=0.4, edgecolors='w')


        # plot the density function for the people preferences
        sns.kdeplot(df_features[columns_x], df_features[columns_y], ax=ax)
        print("x values:", df_features[columns_x].value_counts())
        print("y values:", df_features[columns_y].value_counts())
        sns.rugplot(df_features[columns_x], color="g", ax=ax)
        sns.rugplot(df_features[columns_y], vertical=True, ax=ax);
        plt.title('Perceptual map',y=1.05)
        plt.show()
    except Exception as e:
        print(len(party_x))
        print(len(party_y))
        print("columns_x: ", columns_x)
        print("columns_y: ", columns_y)

import itertools

pairs = list(itertools.combinations(df_features.columns, 2))

[plot_mean(pair[0],pair[1]) for pair in pairs]

但这吸引了我:

enter image description here

有时它向我展示了发行版中的一些内容。我认为是数据足够平衡的时候了?

![enter image description here

示例数据

人们对政党的看法:

>>>df_party_means

    mean    Question    Party
0   0.077083    Question1   Party1
1   -0.838896   Question1   Party2
2   0.931547    Question1   Party3
3   0.798064    Question1   Party4
4   -0.678798   Question1   Party5
5   0.960612    Question2   Party1
6   0.803926    Question2   Party2
7   0.586867    Question2   Party3
8   0.804372    Question2   Party4
9   0.346609    Question2   Party5

人们对问题的回答:

>>> df_features

    Question1   Question2
0   0   1
1   1   1
2   0   1
3   -1  1
4   -1  -1
5   -1  0
...

0 个答案:

没有答案