我目前正在研究分析学生贷款的项目。我正在比较基于性别的学生贷款,但我似乎遇到了问题。当我在数据集中添加贷款列的总和时,我得到的数字不同于两种性别的总和。这是我的代码。
print(male, female)
286 70
f_sum = 0
m_sum = 0
for i in df['LoanAmount']:
for x in df['Gender']:
if x == 'Female':
f_sum += i
else:
m_sum += i
print('Total Sum of LoanAmount:', df['LoanAmount'].sum())
print('Sum of Both Genders:', f_sum + m_sum)
LoanAmount总和:49280.0
两性的总和:128872
我在这里做错了吗?我意识到这可能不是足够的信息,如果您有任何问题,我很乐意回答。
答案 0 :(得分:2)
您需要的是按性别分组,然后将贷款金额加总:
FROM oraclelinux:7.3
COPY apache-cassandra-2.2.8-bin.tar.gz /opt
RUN cd /opt && tar -xvzf apache-cassandra-2.2.8-bin.tar.gz
RUN cd /opt && ln -s apache-cassandra-2.2.8 cassandra
RUN cd /opt/cassandra && mkdir data && mkdir commitlog && mkdir saved_caches
COPY cassandra.yaml /opt/cassandra/conf
COPY cassandra-topology.properties /opt/cassandra/conf
RUN chmod +x /opt/cassandra/bin/cassandra
ENV CASSANDRA_CONFIG /opt/cassandra/conf
ENV CASSANDRA_HOME /opt/cassandra
RUN /opt/cassandra/bin/cassandra
ENTRYPOINT ["/opt/cassandra/bin/cassandra"]
EXPOSE 7000 7001 7199 9160
答案 1 :(得分:0)
您正在循环遍历数据框两次,一次为每个loanAmount,然后为每个性别。您要做的是将过滤器应用于数据帧。 您可以按如下方式执行此操作:
female_sum = df[df['Gender']=='Female']['LoanAmount'].sum()
male_sum = df[df['Gender']=='Male']['LoanAmount'].sum()
答案 2 :(得分:0)
要计算每个性别的贷款金额,您应该使用:
male_loans = df.loc[df.Gender == 'Male', 'LoanAmount'].sum()
female_loans = df.loc[df.Gender == 'Female', 'LoanAmount'].sum()
答案 3 :(得分:0)
您可以按性别过滤数据并总结:
f_sum = df[df['Gender'] == 'Female']['LoanAmount'].sum()
m_sum = df[df['Gender'] == 'Male']['LoanAmount'].sum()