在groupby熊猫中创建过滤器

时间:2019-01-17 19:09:51

标签: python pandas pandas-groupby

我得到以下结果:

[4 rows x 10 columns]
          id  ID_ENTIDADE                       ENTIDADE     CHAMADO     ...                 DATA_ALT VALOR_OLD           VALOR_NEW  PRIORIDADE
695  6802505          136 Professional Services > Ser...  2019000518     ...      2019-01-14 15:21:01       NaN             N1 (20)           0
698  6804412          136 Professional Services > Ser...  2019000518     ...      2019-01-14 15:52:46       NaN  Contrato 157 (198)           0
699  6804413          136 Professional Services > Ser...  2019000518     ...      2019-01-14 15:52:46       1.0                   2          14
700  6804415          136 Professional Services > Ser...  2019000518     ...      2019-01-14 15:52:46       3.0                   1           3
701  6804650          136 Professional Services > Ser...  2019000518     ...      2019-01-14 15:53:32       NaN  N1 - Security (25)           0

[5 rows x 10 columns]
          id  ID_ENTIDADE                       ENTIDADE     CHAMADO     ...                 DATA_ALT VALOR_OLD           VALOR_NEW  PRIORIDADE
705  6805135          136 Professional Services > Ser...  2019000519     ...      2019-01-14 16:02:01       NaN             N1 (20)           0
711  6806934          136 Professional Services > Ser...  2019000519     ...      2019-01-14 16:33:41       NaN  N1 - Security (25)           0
712  6806936          136 Professional Services > Ser...  2019000519     ...      2019-01-14 16:33:41       1.0                   2          14
713  6806938          136 Professional Services > Ser...  2019000519     ...      2019-01-14 16:33:41       3.0                   1           3
710  6806932          136 Professional Services > Ser...  2019000519     ...      2019-01-14 16:33:41       NaN  Contrato 157 (198)           0

[5 rows x 10 columns]
          id  ID_ENTIDADE                       ENTIDADE     CHAMADO     ...                 DATA_ALT VALOR_OLD           VALOR_NEW  PRIORIDADE
717  6808869          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:05:35       NaN  Contrato 135 (136)           0
718  6808870          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:05:35       NaN        N2 - DC (28)           0
757  6810787          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:41:31       3.0                   2           3

[3 rows x 10 columns]
          id  ID_ENTIDADE                       ENTIDADE     CHAMADO     ...                 DATA_ALT VALOR_OLD           VALOR_NEW  PRIORIDADE
719  6808990          136 Professional Services > Ser...  2019000524     ...      2019-01-14 17:10:02       NaN             N1 (20)           0
720  6809088          136 Professional Services > Ser...  2019000524     ...      2019-01-14 17:12:59       NaN  Contrato 157 (198)           0
721  6809090          136 Professional Services > Ser...  2019000524     ...      2019-01-14 17:12:59       NaN  N1 - Security (25)           0
722  6809092          136 Professional Services > Ser...  2019000524     ...      2019-01-14 17:12:59       1.0                   2          14
723  6809094          136 Professional Services > Ser...  2019000524     ...      2019-01-14 17:12:59       3.0                   1           3

[5 rows x 10 columns]

我获得了以下代码:

df = pd.read_csv("csv3.csv", sep=";", encoding = "ISO-8859-1")
df2 = df.sort_values(['CHAMADO', 'id'])


g1 = df2.sort_values(['DATA_ALT'], ascending=True)


ret_group = g1.groupby(['CHAMADO'])

for key, group in ret_group:
    if  group['PRIORIDADE'].any() == True:

        print(group)

但是我需要一个过滤器来检查“ VALOR_NEW”列的前3行是否包含单词“ CONTRATO”。

我无法创建一个可以做到这一点的过滤器,每执行一次过滤器,它只会向我返回第一行出现“ CONTRATO”字样的结果,如下例所示:

          id  ID_ENTIDADE                       ENTIDADE     CHAMADO     ...                 DATA_ALT VALOR_OLD           VALOR_NEW  PRIORIDADE
717  6808869          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:05:35       NaN  Contrato 135 (136)           0
718  6808870          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:05:35       NaN        N2 - DC (28)           0
757  6810787          105 Professional Services > Sup...  2019000523     ...      2019-01-14 17:41:31       3.0                   2           3

[3 rows x 10 columns]

3 个答案:

答案 0 :(得分:1)

您可以使用以下内容:

my_list = list(df.groupby('CHAMADO').apply(lambda x: x[:3][x[:3]['VALOR_NEW'].str.contains('Contrato',na=False)])['CHAMADO'].values)
#[2019000518, 2019000523, 2019000524]

这给出了前三行中包含单词Contrato的组的列表。

>>df[df.CHAMADO.isin(my_list)]

这将为您提供整个数据帧,其中包含CHAMADO下的值,该值在前三行中包含Contrato

要在单独的数据框中存储值,可以创建一个字典:

dfs = ['df_' + str(x) for x in my_list]
dicdf = dict()
i = 0 
while i < len(dfs):
    dicdf[dfs[i]] = df[(df['CHAMADO']== my_list[i])]
    i = i + 1
print(dicdf)

您可以通过喜欢或在SO中看到的任何其他方法进行存储。

答案 1 :(得分:0)

如果该列是字符串列,则仅按字符串contains过滤值。假设您的df为group。由于您希望所有结果都显示出来,因此您可以制作一个非优雅的解决方案,该解决方案可以制作3个数据帧:

首先,包含CONTRATO的内容:

df1 = group.loc[group['VALOR_NEW'].fillna('nada').str.contains('CONTRATO')].head(3)

然后另一个不出现(注意.loc开头的“〜”,意思相反:

df2 = group.loc[~group['VALOR_NEW'].fillna('nada').str.contains('CONTRATO')]

最后,您首先与CONTRATO的人并存:

df3 = pd.concat([df1,df2])

请注意,它区分大小写,并且我添加了fillna(),因为您无法搜索它是否具有nan值。

编辑:根据您的要求在末尾添加.head(3),仅获得前3行。

edit2:编辑了原始解决方案

vlwflws

答案 2 :(得分:0)

IIUC,您需要在前三行的'Contrato'列中检查'VALOR_NEW'的分组数据帧。此检查应执行以下操作:

if (group['VALOR_NEW'].head(3) == 'Contrato').any():
    print(group)