我想将整个数据集转换为百分比。
https://cocl.us/datascience_survey_data
要找出该行的百分比总和。
例如大数据(Spark / Hadoop)= 1332 + 729 + 127 = 2188
因此该百分比将非常有趣:60.87%
我想为所有行自动执行此操作。 怎么做?
答案 0 :(得分:3)
您可以按行将DataFrame.div
的列的所有数据除以sum
,然后再除以100
:
df = pd.read_csv('Topic_Survey_Assignment.csv', index_col=0)
df1 = df.div(df.sum(axis=1), axis=0).mul(100)
print (df1)
Very interested Somewhat interested \
Big Data (Spark / Hadoop) 60.877514 33.318099
Data Analysis / Statistics 77.007299 20.255474
Data Journalism 20.235849 50.990566
Data Visualization 61.580882 33.731618
Deep Learning 58.229599 35.500231
Machine Learning 74.724771 21.880734
Not interested
Big Data (Spark / Hadoop) 5.804388
Data Analysis / Statistics 2.737226
Data Journalism 28.773585
Data Visualization 4.687500
Deep Learning 6.270171
Machine Learning 3.394495
详细信息:
print (df.sum(axis=1))
Big Data (Spark / Hadoop) 2188
Data Analysis / Statistics 2192
Data Journalism 2120
Data Visualization 2176
Deep Learning 2169
Machine Learning 2180
dtype: int64
Numpy替代品非常相似:
df = pd.read_csv('Topic_Survey_Assignment.csv', index_col=0)
arr = df.values
df1 = pd.DataFrame(arr / np.sum(arr, axis=1)[:, None] * 100,
index=df.index,
columns=df.columns)
print (df1)
Very interested Somewhat interested \
Big Data (Spark / Hadoop) 60.877514 33.318099
Data Analysis / Statistics 77.007299 20.255474
Data Journalism 20.235849 50.990566
Data Visualization 61.580882 33.731618
Deep Learning 58.229599 35.500231
Machine Learning 74.724771 21.880734
Not interested
Big Data (Spark / Hadoop) 5.804388
Data Analysis / Statistics 2.737226
Data Journalism 28.773585
Data Visualization 4.687500
Deep Learning 6.270171
Machine Learning 3.394495
答案 1 :(得分:1)
最快的选择是使用numpy。无论数据多大,计算都将很快
import numpy as np
#get the values
values = data[['Very interested', 'Somewhat interested', 'Not interested']].values
#get the sum of each row
sums = values.sum(axis=1).T
#reshape the sums for the purposes of division
sums = np.reshape(sums, (-1, 1))
#divide each value with the sum value and multiply with 100
percentages = (values / sums) * 100
#assign the calculatiton back to the original data
data[['Very interested', 'Somewhat interested', 'Not interested']] = percentages
#print the data
print(data)
Unnamed: 0 Very interested Somewhat interested Not interested
0 Big Data (Spark / Hadoop) 60.877514 33.318099 5.804388
1 Data Analysis / Statistics 77.007299 20.255474 2.737226
2 Data Journalism 20.235849 50.990566 28.773585
3 Data Visualization 61.580882 33.731618 4.687500
4 Deep Learning 58.229599 35.500231 6.270171
5 Machine Learning 74.724771 21.880734 3.394495
答案 2 :(得分:0)
import pandas as pd
df= pd.read_csv('filename.csv')
df['very_interested_pct']=(df['Very interested']/(df['Somewhat interested']+df['Very interested']+df['Not interested']))*100
这将创建一个名为very_interested_pct的新列,您可以对其他两列执行相同操作,并删除前几列。