Question

编辑：我提供了更多关于这个问题的背景知识，以帮助澄清。

我最初开始时使用的是具有“键”的df＆＃39;每个月标记为第1个月到第12个月。在给定密钥和月份的每个单元格中，有1或0表示患者（＆＃39;密钥＆＃39;）在该月内是否有保险，（1 = True，0 = False）。在另一个df我有大约105列，其中包括一个＆＃39; Key＆＃39;和＆＃39;日期1＆＃39;和＆＃39;日期2＆＃39;。我的目标是找到在提供的日期（包括）之间有保险范围的行。我特意想要那些行。需要注意的是，如果给定患者的任何行（＆＃39;键＆＃39;）没有提供所提供日期的保险（以及日期之间的时间），那么我想要删除该患者的所有行。

所以最初，我将两个数据帧合并在一起，并创建了另外两个包含StartMonth和EndMonth的列，这些列分别派生自date1和date2。我现在需要检查患者在这段时间内是否有保险。

例如，在下面的数据框中有12个月中的6个（所以它不是太大）。第一行将被删除，因为患者在StartMonth和EndMonth之间没有保险。第二行将保留，因为在他们的StartMonth和EndMonth期间有保险。第3行和第4行将被删除，因为即使第3行确实对日期有保险，第4行也没有，所以患者的所有行（＆＃39;键＆＃39;）都将被删除。

df = pd.DataFrame({'KEY': ['1312', '1345', '5555', '5555'], 
              'Month1': [1, 1, 1,1],
              'Month2': [1, 1, 1,1],
              'Month3': [0, 1, 1,1],
              'Month4': [0, 1, 0,0],
              'Month5': [0, 1, 0,0],
              'Month6': [0, 1, 0,0],
              'Date1': [20120304, 20120102, 20120203,20120402],
              'Date2': [20120405,20120104,20120502,20120501],
              'StartMonth': [3,1,1,4],
              'EndMonth': [4,1,3,5]})
df[['KEY','Date1','Date2','StartMonth','EndMonth','Month1', 'Month2','Month3','Month4','Month5','Month6']]

原始数据框：

    KEY     Date1       Date2       StartMonth  EndMonth    Month1  Month2  Month3  Month4  Month5  Month6
0   1312    20120304    20120405    3           4           1       1       0       0       0       0
1   1345    20120102    20120104    1           1           1       1       1       1       1       1
2   5555    20120203    20120502    1           3           1       1       1       0       0       0
3   5555    20120402    20120501    4           5           1       1       1       0       0       0

最终结果：

    KEY     Date1       Date2       StartMonth  EndMonth    Month1  Month2  Month3  Month4  Month5  Month6
1   1345    20120102    20120104    1           1           1       1       1       1       1       1

我最初的方法是通过连接“月”字来找到我需要的列。使用在StartMonth和EndMonth中找到的值。在这样做之后，我认为我可以为保险时间框架创建边界，但是，我在这种方法中遇到了错误。我很早就在这个过程中，但我认为这可能不是最好的方法。任何帮助都会很棒，这是一个棘手的问题。

df.groupby('KEY').filter(lambda x: x['Month'+ x.iloc[x]['StartMonth']]==1.0)

IndexError: positional indexers are out-of-bounds

我正在研究的另一种方法是创建一个列名列表，我为每行的startmonth和endmonnth派生这些列名。然后我想我可以将这些列名包含到.filter（）中，看看列的范围是否为0。

MonthRange = []
StartMonthStr = []
EndMonthStr = []
StartMonthInt = df['StartMonth'].tolist()
EndMonthInt = df['EndMonth'].tolist()

for x,y in zip(StartMonthInt, EndMonthInt):
    sm = 'Month' + str(x)
    em = 'Month' + str(y)
    diff = y - x
    MonthRange.append(diff)
    StartMonthStr.append(sm)
    EndMonthStr.append(em)

Answer 1

这可能是你想要的。

def condition(row):
    return row['KEY'] if not all(row['Month'+str(i)] \
           for i in range(row['aStartMonth'], row['aEndMonth']+1)) else None

df = df[~df['KEY'].isin(df.apply(condition, axis=1))]

#       Date1     Date2        KEY  Month1  Month2  Month3  Month4  Month5  \
# 0  20120304  20120405  100000003       1       1       1       1       1   

#    Month6  aEndMonth  aStartMonth  
# 0       1          4            3

Answer 2

首先定义一个检查逻辑的函数：

multiplySquare

然后将此功能应用于每个组并过滤数据：

使用数据框架pandas中的值选择列

2 个答案: