我想绘制两个变量的域[-1,0,1]的分布。我以为我可以在横坐标轴和纵坐标轴上制作箱形图,或者分配功能。我想要类似的东西:
但这是一种内核平滑技术,用于获得我发现的here的2D空间的概率密度函数(PDF)。因此,我也会很满意:
直到现在我有了:
def plot_mean(columns_x, columns_y):
try:
f, ax = plt.subplots(figsize=(6, 6))
plt.axis([-1, 1, -1, 1])
plt.grid(True)
plt.xlabel(columns_x)
plt.ylabel(columns_y)
# We get all parties from df_parties_means
for party in df_parties_means['Party']:
# we get the probability distribution function
party_x = df_parties_means.loc[
((df_parties_means['Question'] == columns_x) & (df_parties_means['Party'] == party)), 'Mean']
party_y = df_parties_means.loc[
((df_parties_means['Question'] == columns_y) & (df_parties_means['Party'] == party)), 'Mean']
# we plot the party related to the questions
plt.scatter(party_x.values[0], party_y.values[0],
alpha=0.4, edgecolors='w',label = party)
plt.text(party_x.values[0], party_y.values[0], party, fontsize=10)
# We plot the people preferences
plt.scatter(df_features[columns_x].mean( skipna = True), df_features[columns_y].mean( skipna = True),
alpha=0.4, edgecolors='w')
# plot the density function for the people preferences
sns.kdeplot(df_features[columns_x], df_features[columns_y], ax=ax)
print("x values:", df_features[columns_x].value_counts())
print("y values:", df_features[columns_y].value_counts())
sns.rugplot(df_features[columns_x], color="g", ax=ax)
sns.rugplot(df_features[columns_y], vertical=True, ax=ax);
plt.title('Perceptual map',y=1.05)
plt.show()
except Exception as e:
print(len(party_x))
print(len(party_y))
print("columns_x: ", columns_x)
print("columns_y: ", columns_y)
import itertools
pairs = list(itertools.combinations(df_features.columns, 2))
[plot_mean(pair[0],pair[1]) for pair in pairs]
但这吸引了我:
有时它向我展示了发行版中的一些内容。我认为是数据足够平衡的时候了?
人们对政党的看法:
>>>df_party_means
mean Question Party
0 0.077083 Question1 Party1
1 -0.838896 Question1 Party2
2 0.931547 Question1 Party3
3 0.798064 Question1 Party4
4 -0.678798 Question1 Party5
5 0.960612 Question2 Party1
6 0.803926 Question2 Party2
7 0.586867 Question2 Party3
8 0.804372 Question2 Party4
9 0.346609 Question2 Party5
人们对问题的回答:
>>> df_features
Question1 Question2
0 0 1
1 1 1
2 0 1
3 -1 1
4 -1 -1
5 -1 0
...