如何使用散点图大小将x,y值的频率表示为每个x值的总计数的百分比?

时间:2019-03-17 22:00:24

标签: python pandas matplotlib

我正在尝试使用散点图来表示某些基于性别的流媒体服务的受欢迎程度。到目前为止,我已经找到了解决方案,这些解决方案显示了如何创建散点图,其中标线标记的大小代表每个选择的频率。但是,我们的调查显示,女性回答比男性回答要多得多,我希望用情节标记的大小来表示每个选择的频率占该性别在总选择中所占的百分比。例如,假设我们有60位男性回应和120位女性回应,而30位男性和60位女性选择了“ Netflix”作为他们最喜欢的服务。即使每种性别的50%都选择“ Netflix”作为他们的最爱,但女性选择的地标会更大。我希望该地标的大小代表男性(30名)或女性(60名)的数量)选择“ Netflix”的比例为男性(60)或女性(120)的百分比。

我已经尝试以多种方式操纵在plot.scatter()函数中使用的表达式,但是我似乎无法完全按照我想要的方式来使它工作。我在下面包含了我的代码。如果需要,我可以提供.csv文件。谢谢。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plot
from collections import Counter

pd.option_context('display.max_rows', None, 'display.max_columns', None)
surveyData = pd.read_csv('Survey.csv')



# Rename columns and remove Timestamp column
surveyData.rename(columns={ 'How old are you?' : 'AGE',
                            'What is your gender?(Optional)' : 'GEN',
                            'Which devices do you use for streaming video? (Select all that apply)' : 'VD',
                            'Which video streaming services do you use? (Select all that apply)' : 'VS',
                            'If you chose more than one video streaming service, please select the one that is your favorite. (choose only one)' : 'FVS',
                            'On average, how many hours a day do you stream video? (Enter whole number)' : 'VH',
                            'Which do you prefer - Video Streaming Services or Cable Providers?' : 'VP',
                            'What would be the main reason for this? (optional)' : 'VR',
                            'Which devices do you use for streaming music? (Select all that apply)' : 'MD',
                            'Which music streaming services do you use? (Select all that apply)' : 'MS',
                            'If you chose more than one music streaming service, please select the one that is your favorite. (choose only one)' : 'MF',
                            'On average, how many hours a day do you stream music? (Enter whole number)' : 'MH',
                            'Which format do you prefer - streaming music, radio, or physical copies?           ' : 'MP',
                            'What would be the main reason for this? (optional).1' : 'MR'
                            },inplace=True)
surveyData.drop(['Timestamp'], axis=1, inplace=True)

#convert stings within each column to numerical values for plotting
surveyData = surveyData.fillna('None')
gender = {'Female': 0, 'Male': 1, "None": 2}
svcs = {    'Netflix': 1, 'DirecTV Now': 2, 'Amazon Prime Video': 3, 'Sling TV': 4,
            'YouTube': 5, 'fuboTV': 6, 'PlayStation Vue': 7, 'Hulu': 8, 'YouTube TV': 9,
            'Twitch': 10, 'None': 13, 'Google Play Movies & TV': 11, 'HBO': 12}

surveyData["GEN"] = [gender[item] for item in surveyData["GEN"]]
surveyData["FVS"] = [svcs[item] for item in surveyData["FVS"]] 


#PLOT DATA
#Use Counter to set transformation of plot size relative to frequency of choice
x = surveyData['GEN']
y = surveyData['FVS']
c = Counter(zip(x, y))
s = [20*c[(xx,yy)] for xx,yy in zip(x,y)]

#plot with formatting
xticks = ['', 'Netflix', 'DirecTV Now', 'Amazon Prime Video', 'Sling TV', 'YouTube', 'fuboTV', 'PlayStation Vue',
            'Hulu', 'YouTube TV', 'Twitch', 'GooglePlay Movies & TV', 'HBO', 'None']

surveyData.plot.scatter('FVS', 'GEN', s=s, c='blue')
yticks = ['Female', 'Male', 'Prefer Not To Say']
plot.xticks(np.arange(14), xticks, rotation=-90)
plot.yticks(np.arange(3), yticks)
plot.xlabel('Favorite Streaming Services')
plot.ylabel('Gender')
plot.show()

0 个答案:

没有答案