我想匹配两列,如果它们都是True。我需要从一个非常大的数据集中删除子集。
我遇到了2个问题:
这是我正在使用的匹配条件:
combine_series = pd.DataFrame(dict(rowA = rowA, rowB = rowB))
combine_series['Matched'] = np.where(combine_series['rowA'] == combine_series['rowB'], True, False)
以下是生成的匹配列
rowA rowB Matched
baseCptyID 2231200 5900 False
extCptyID 5900 2231200 False
notional 3.4e+07 3.4e+07 True
startDate 2015-05-29 2015-05-29 True
expiryDate NaN NaN False
settlementDate 2020-06-29 2020-06-29 True
rate 0.03375 0.03375 True
spread NaN NaN False
paymentFreq PA PA True
resetFreq PA PA True
modelUsed FixedLeg FixedLeg True
PayoutCCY AUD AUD True
DayCountConv ACT/ACT ICMA ACT/ACT ICMA True
join_column 2231200 2231200 True
答案 0 :(得分:0)
让我们试试这个以NaN为主的逻辑!= NaN解析为True。
df['Matched']=(df.rowA == df.rowB) | ((df.rowA != df.rowA) & (df.rowB != df.rowB))
输出:
rowA rowB Matched
baseCptyID 2231200 5900 False
extCptyID 5900 2231200 False
notional 3.4e+07 3.4e+07 True
startDate 2015-05-29 2015-05-29 True
expiryDate NaN NaN True
settlementDate 2020-06-29 2020-06-29 True
rate 0.03375 0.03375 True
spread NaN NaN True
paymentFreq PA PA True
resetFreq PA PA True
modelUsed FixedLeg FixedLeg True
PayoutCCY AUD AUD True
DayCountConv ACT/ACT ICMA ACT/ACT ICMA True
join_column 2231200 2231200 True