this is some of the data that is located in the excel sheet
我想选择音乐剧演出(在代码中称为“ ID”),其演出人数比高加索人少 确定之后,我想将所选代码的信息放入一个新的数据框中, 将只保留节目,因为它将更易于操作。在新的数据框中,我想在节目的同一行中显示相关种族,以便与受众种族进行比较。然后,我尝试绘制此信息。
因此,通常,如果该行符合特定的求和标准,我想将这些行中的值相加。此项目中使用的所有数据都位于excel工作表中,该工作表会转换为csv并作为数据框上传。然后,我想绘制演员表的全部价值,并将演员的族裔与观众的族裔进行比较。
我正在使用python,并且我尝试通过使用if语句选择列来删除不需要的数据,从而使数据框仅包含比白种人更具少数派的表演,然后我尝试使用它情节中的信息。我不确定如果我不在计算中使用它们,是否必须过滤所有不需要的列
import numpy as np
import pandas as pd
#first need to import numpy so that calculations can be made
from google.colab import files
uploaded = files.upload()
# df = pd.read_csv('/content/drive/My Drive/allTheaterDataV2.csv')
import io
df = pd.read_csv(io.BytesIO(uploaded['allTheaterDataV2.csv']))
# need to download excel sheet as csv and then upload into colab so that it can
# be manipulated as a dataframe
# want to select shows(ID) that had more minorities than Caucasians in the cast
# once determined, the selected shows should be placed into a new data frame that
# will only hold the shows and the related ethnicity, and compared to audience ethnicity
# this information should then be plotted
# first we will determine the shows that have a majority ethnic cast
minorcal = list(df)
minorcal.remove('CAU')
minoritycastSUM = df[minorcal].sum(axis=1)
# print(minorcal)
# next, we determine how many people in the cast were Caucasian, so remove all others
caucasiancal = list(df)
# i first wanted to do caucasiancal.remove('AFRAM', 'ASIAM', 'LAT', 'OTH')
# but got the statement I could only have 1 argument so i just put each on their own line
caucasiancal.remove('AFRAM')
caucasiancal.remove('ASIAM')
caucasiancal.remove('LAT')
caucasiancal.remove('OTH')
idrowcaucal = df[caucasiancal].sum(axis=1)
minoritycompare = old.filter(['idrowcaucal','minoritycastSUM'])
print(minoritycompare)
# now compare the two values per line
if minoritycastSUM < caucasiancal:
minoritydf = pd.df.minorcal.append()
# plot new data frame per each show and compare to audience ethnicity
df.plot(x=['AFRAM', 'ASIAM', 'CAU', 'LAT', 'OTH', 'WHT', 'BLK', 'ASN', 'HSP', 'MRO'], y = [''])
# i am unsure how to call the specific value for each column
plt.title('ID Ethnicity Comparison')
# i am unsure how to call the specific show so that only one show is per plot so for now i just subbed in 'ID'
plt.xlabel('Ethnicity comparison')
plt.ylabel('Number of Cast Members/Audience Members')
plt.show()
我想查看具有符合条件的特定展示的数据框,然后是该展示的图,但是现在我在如何制定新的数据框和python时出错了,说if语句不能使用。[2]
答案 0 :(得分:0)
首先,这将不是一个完整的答案,如
尽管如此,我还是基于this answer中的DataFrame构建的,也许每部电影的“非白种人/高加索比率”的初始情节可以为您指明正确的方向。 也许您可以为观众列构建一组类似的总和与比率列,然后将演员比率绘制为观众比率的函数,以查看更多白人观众是喜欢还是不喜欢白人演员(你在追求什么?)。
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'ID':['Billy Elliot','next to normal','shrek','guys and dolls',
'west side story', 'pal joey'],
'Season' : [20082009,20082009,20082009,
20082009,20082009,20082009],
'AFRAM' : [2,0,4,4,0,1],
'ASIAM' : [0,0,1,0,0,0],
'CAU' : [48,10,25,24,28,20],
'LAT' : [1,0,1,3,18,0],
'OTH' : [0,0,0,0,0,0],
'WHT' : [73.7,73.7,73.7,73.7,73.7,73.7]})
## define a sum column for non caucasian actors (I suppose?)
df['non_cau']=df[['AFRAM','ASIAM','LAT','OTH']].sum(axis=1)
## build a ratio of non caucasian to caucasian
df['cau_ratio']=df['non_cau']/df['CAU']
## make a quick plot
fig,ax=plt.subplots()
ax.scatter(df['ID'],df['cau_ratio'])
ax.set_ylabel('non cau / cau ratio')
plt.tight_layout()
plt.show()