无法在DataFrame中使用两列进行迭代

时间:2017-04-10 18:33:14

标签: python pandas

我目前正在研究分析学生贷款的项目。我正在比较基于性别的学生贷款,但我似乎遇到了问题。当我在数据集中添加贷款列的总和时,我得到的数字不同于两种性别的总和。这是我的代码。

print(male, female)
  

286 70

f_sum = 0

m_sum = 0

for i in df['LoanAmount']:
  for x in df['Gender']:
    if x == 'Female':
        f_sum += i
    else:
        m_sum += i

print('Total Sum of LoanAmount:', df['LoanAmount'].sum())

print('Sum of Both Genders:', f_sum + m_sum)
  

LoanAmount总和:49280.0

     

两性的总和:128872

我在这里做错了吗?我意识到这可能不是足够的信息,如果您有任何问题,我很乐意回答。

4 个答案:

答案 0 :(得分:2)

您需要的是按性别分组,然后将贷款金额加总:

FROM oraclelinux:7.3
COPY apache-cassandra-2.2.8-bin.tar.gz /opt
RUN cd /opt && tar -xvzf apache-cassandra-2.2.8-bin.tar.gz
RUN cd /opt && ln -s apache-cassandra-2.2.8 cassandra
RUN cd /opt/cassandra && mkdir data && mkdir commitlog && mkdir saved_caches
COPY cassandra.yaml /opt/cassandra/conf
COPY cassandra-topology.properties /opt/cassandra/conf
RUN chmod +x /opt/cassandra/bin/cassandra
ENV CASSANDRA_CONFIG /opt/cassandra/conf
ENV CASSANDRA_HOME /opt/cassandra
RUN /opt/cassandra/bin/cassandra
ENTRYPOINT ["/opt/cassandra/bin/cassandra"]
EXPOSE 7000 7001 7199 9160

答案 1 :(得分:0)

您正在循环遍历数据框两次,一次为每个loanAmount,然后为每个性别。您要做的是将过滤器应用于数据帧。 您可以按如下方式执行此操作:

female_sum = df[df['Gender']=='Female']['LoanAmount'].sum()
male_sum = df[df['Gender']=='Male']['LoanAmount'].sum()

答案 2 :(得分:0)

要计算每个性别的贷款金额,您应该使用:

male_loans = df.loc[df.Gender == 'Male', 'LoanAmount'].sum()
female_loans = df.loc[df.Gender == 'Female', 'LoanAmount'].sum()

答案 3 :(得分:0)

您可以按性别过滤数据并总结:

    f_sum = df[df['Gender'] == 'Female']['LoanAmount'].sum()
    m_sum = df[df['Gender'] == 'Male']['LoanAmount'].sum()