Question

问题

您好，我正在尝试比较两个Series元素，以获取具有“ True”和“ False”值的Series。这是我要比较的两列：

    Loan        Date 1      Date2
405 1022    2020-02-29  2019-10-31
406 1022    2020-02-29  2019-11-30
407 1022    2020-02-29  2019-12-31
408 1022    2020-02-29  2020-01-31
405 1030    2020-05-31  2020-01-31
406 1030    2020-05-31  2020-02-29
407 1030    2020-05-31  2020-03-31
408 1030    2020-05-31  2020-04-30

我想要实现的是：

对于每笔贷款，取最后一行，如果“日期1”等于“日期2”，则保留“日期2”，否则，使“日期2”等于“日期” 1

我的尝试

a = df[["Loan","Date 1"]].groupby("Loan").tail(1)
b = df[["Loan","Date 2"]].groupby("Loan").tail(1)

df["new_date"] = np.where(a==b,b,a)

也尝试过

(a==b).any() and (a==b).all()

错误： ValueError：具有多个元素的数组的真值不明确。使用a.any（）或a.all（）

Answer 1

在groupby上使Loan并使用tail进行汇总，然后对loc使用布尔索引来替换Date2中的值，其中Date2不等于Date1：

d = df.groupby('Loan').tail(1)
d.loc[d['Date1'].ne(d['Date2']), 'Date2'] = d['Date1']

     Loan      Date1      Date2
408  1022 2020-02-29 2020-02-29
408  1030 2020-05-31 2020-05-31

Answer 2

您可以简单地将Date2替换为Date1，以消除错误并获取数据：

import pandas as pd
from io import StringIO

csv_string = StringIO("""Loan        Date1      Date2
1022    2020-02-29  2019-10-31
1022    2020-02-29  2019-11-30
1022    2020-02-29  2019-12-31
1022    2020-02-29  2020-01-31
1030    2020-05-31  2020-01-31
1030    2020-05-31  2020-02-29
1030    2020-05-31  2020-03-31
1030    2020-05-31  2020-04-30""" )

df = pd.read_csv(csv_string, sep=" ", skipinitialspace=True)

grp = df.groupby(["Loan", "Date1"]).tail(1)
grp["Date2"] = grp["Date1"]

print(grp)

输出：

   Loan       Date1       Date2
3  1022  2020-02-29  2020-02-29
7  1030  2020-05-31  2020-05-31

请参见ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

比较两个熊猫系列时发生ValueError

2 个答案: