我有这个DataFrame:
df = pd.DataFrame(columns=["App","Feature1", "Feature2","Feature3",
"Feature4","Feature5",
"Feature6","Feature7","Feature8"],
data=[["SHA",0,0,1,1,1,0,1,0],
["LHA",1,0,1,1,0,1,1,0],
["DRA",0,0,0,0,0,0,1,0],
["FRA",1,0,1,1,1,0,1,1],
["BRU",0,0,1,0,1,0,0,0],
["PAR",0,1,1,1,1,0,1,0],
["AER",0,0,1,1,0,1,1,0],
["SHE",0,0,0,1,0,0,1,0]])
更新:(抱歉,我错误地制定了预期结果)
我想计算每个功能显示值1
的时间:
Features Count
Feature1 6
Feature2 7
...
我试过了:
df.groupBy("App").count()
但我没有得到预期的输出。
答案 0 :(得分:1)
使用:
img_data = request.POST['image']
img = MIMEImage(img_data[img_data.find(",")+1:].decode('base64'), 'jpeg')
img.add_header('Content-Id', '<file>')
img.add_header("Content-Disposition", "inline", filename="file.jpg")
msg.attach(img)
与#remove column App, compare and get sum of Trues
a0 = df.drop('App', 1).eq(0).sum()
#a0 = df.set_index('App').eq(0).sum()
#alternative with select only Feature columns
#a0 = df.filter(like='Feature').eq(0).sum()
#alternative with select all columns without first
a0 = df.iloc[:, 1:].eq(0).sum()
print (a0)
Feature1 6
Feature2 7
Feature3 2
Feature4 2
Feature5 4
Feature6 6
Feature7 1
Feature8 7
dtype: int64
比较相似:
1
全部与value_counts
:
a1 = df.drop('App', 1).eq(1).sum()
#a1 = df.set_index('App').eq(1).sum()
#alternative
#a1 = df.filter(like='Feature').eq(1).sum()
#alternative
a1 = df.iloc[:, 1:].eq(1).sum()
print (a1)
Feature1 2
Feature2 1
Feature3 6
Feature4 6
Feature5 4
Feature6 2
Feature7 7
Feature8 1
dtype: int64
或者列表理解:
a = df.drop('App', 1).apply(pd.value_counts).T.add_prefix('count_')
print (a)
count_0 count_1
Feature1 6 2
Feature2 7 1
Feature3 2 6
Feature4 2 6
Feature5 4 4
Feature6 6 2
Feature7 1 7
Feature8 7 1
答案 1 :(得分:0)
使用融化的另一种方式:
首先获取长格式数据:
df_melt=pd.melt(df, id_vars='App', value_vars=['Feature%d'%(i) for i in range(1,9)], var_name='Features', value_name='value')
然后分组Features
并计算1:
df_melt.groupby('Features').sum().reset_index().rename(columns={'value':'count'})