比较两个熊猫系列时发生ValueError

时间:2020-10-09 11:38:31

标签: python python-3.x pandas dataframe

问题

您好,我正在尝试比较两个Series元素,以获取具有“ True”和“ False”值的Series。这是我要比较的两列:

    Loan        Date 1      Date2
405 1022    2020-02-29  2019-10-31
406 1022    2020-02-29  2019-11-30
407 1022    2020-02-29  2019-12-31
408 1022    2020-02-29  2020-01-31
405 1030    2020-05-31  2020-01-31
406 1030    2020-05-31  2020-02-29
407 1030    2020-05-31  2020-03-31
408 1030    2020-05-31  2020-04-30

我想要实现的是:

对于每笔贷款,取最后一行,如果“日期1”等于“日期2”,则保留“日期2”,否则,使“日期2”等于“日期” 1

我的尝试

a = df[["Loan","Date 1"]].groupby("Loan").tail(1)
b = df[["Loan","Date 2"]].groupby("Loan").tail(1)

df["new_date"] = np.where(a==b,b,a)

也尝试过

(a==b).any() and (a==b).all()

错误: ValueError:具有多个元素的数组的真值不明确。使用a.any()或a.all()

2 个答案:

答案 0 :(得分:1)

groupby上使Loan并使用tail进行汇总,然后对loc使用布尔索引来替换Date2中的值,其中Date2不等于Date1

d = df.groupby('Loan').tail(1)
d.loc[d['Date1'].ne(d['Date2']), 'Date2'] = d['Date1']

     Loan      Date1      Date2
408  1022 2020-02-29 2020-02-29
408  1030 2020-05-31 2020-05-31

答案 1 :(得分:0)

您可以简单地将Date2替换为Date1,以消除错误并获取数据:

import pandas as pd
from io import StringIO

csv_string = StringIO("""Loan        Date1      Date2
1022    2020-02-29  2019-10-31
1022    2020-02-29  2019-11-30
1022    2020-02-29  2019-12-31
1022    2020-02-29  2020-01-31
1030    2020-05-31  2020-01-31
1030    2020-05-31  2020-02-29
1030    2020-05-31  2020-03-31
1030    2020-05-31  2020-04-30""" )

df = pd.read_csv(csv_string, sep=" ", skipinitialspace=True)

grp = df.groupby(["Loan", "Date1"]).tail(1)
grp["Date2"] = grp["Date1"]

print(grp)

输出:

   Loan       Date1       Date2
3  1022  2020-02-29  2020-02-29
7  1030  2020-05-31  2020-05-31

请参见ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()