根据熊猫中的列和条件对值进行分组

时间:2019-11-22 07:05:39

标签: python pandas

我想根据条件将pandas dataframe列分组,如果值在+20范围内。 下面是数据框

{'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
 'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
 'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
 'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}}

在这里,我想对“左”和“上”列进行分组。 这就是我所做的:

df.groupby(['Top'],as_index=False).agg(lambda x: list(x))

这是我得到的结果:

    {'Top': {0: 10, 1: 26, 2: 35, 3: 60, 4: 71},
 'Name': {0: ['A'], 1: ['B', 'C'], 2: ['D'], 3: ['E'], 4: ['F']},
 'ID': {0: [100], 1: [23, 19], 2: [42], 3: [11], 4: [78]},
 'Left': {0: [70], 1: [70, 70], 2: [43], 3: [66], 4: [66]}}

所需的输出:

{'Top': {0: [10, 26], 2: 35, 3: [60,71]},
 'Name': {0: ['A', 'B', 'C'], 2: ['D'], 3: ['E', 'F']},
 'ID': {0: [100, 23, 19], 2: [42], 3: [11, 78]},
 'Left': {0: [70, 50, 87], 2: [43], 3: [66, 99]}}

注意:

要考虑的重要事项是Top值10和26在20的范围内,它形成一个组。即使35和26之间的差值在20的范围内,也不应该将35添加到该组中,因为10和20已经在一个组中,而10(该组中的最小值)和35之间的差值不在范围为20。

有没有其他解决方法?

编辑:

我有一个不同的用例,其最高价值会增加,当它移到新页面时,最高价值会发生变化并再次开始增加。对于不同的输入,此操作继续进行。最后,我想按输入文件名,页码和分组进行分组。如何将它们分组?

{'Input File Name': {0: 268441,
  1: 268441,
  2: 268441,
  3: 268441,
  4: 268441,
  5: 268441,
  6: 268441,
  7: 268441,
  8: 268441,
  9: 268441,
  10: 268441,
  11: 268441,
  12: 268441,
  13: 268441,
  14: 268441,
  15: 268441,
  16: 268441,
  17: 268441,
  18: 268441,
  19: 268441,
  20: 268441,
  21: 268441,
  22: 268441,
  23: 268441,
  24: 268441,
  25: 268441,
  26: 268441,
  27: 268441,
  28: 268441,
  29: 268441,
  30: 268441,
  31: 268441,
  32: 268441,
  33: 268441,
  34: 268441,
  35: 268441,
  36: 268441,
  37: 268441,
  38: 268441,
  39: 268441},
 'Page Number': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1,
  11: 1,
  12: 1,
  13: 1,
  14: 1,
  15: 1,
  16: 1,
  17: 1,
  18: 1,
  19: 1,
  20: 2,
  21: 2,
  22: 2,
  23: 2,
  24: 2,
  25: 2,
  26: 2,
  27: 2,
  28: 2,
  29: 2,
  30: 2,
  31: 2,
  32: 2,
  33: 2,
  34: 2,
  35: 2,
  36: 2,
  37: 2,
  38: 2,
  39: 2},
 'Content': {0: '3708 Forestview Road',
  1: 'AvailableForLease&Sale',
  2: '1,700± SFMedicalOffice',
  3: '3708ForestviewRoad',
  4: 'Suite107',
  5: 'Raleigh,NC27612',
  6: 'BuildingDescription',
  7: '22,278± SFClassAOfficeBuilding',
  8: 'OnlyOneSuiteLeft toLeaseand/orPurchase',
  9: '(1)1,700± SFShell',
  10: 'FlexibleLeaseTerms',
  11: '2Floorsw/Elevator&Stairsto2',
  12: 'Level',
  13: 'nd',
  14: 'ClassAFinishes',
  15: 'On-SitePropertyManagement',
  16: 'LargeGlass Windows',
  17: '5:1Parking',
  18: 'Formoreinformation,contact:',
  19: 'OtherTenants: PivotPhysicalTherapy,TheLundy',
  20: 'LeasingDetails',
  21: 'SpaceDescription',
  22: 'LeaseRate',
  23: 'CompetitiveNNN+$5.50TICAM',
  24: 'Tenant',
  25: 'Suite107:1,700± SF',
  26: 'Janitorial&Electric',
  27: 'Responsibilities',
  28: 'ShellSpacew/TIAllowance&Architecturals',
  29: 'ClassABuilding',
  30: 'SalePrice',
  31: '$374,000or$220PSF',
  32: 'BeautifulDouble-DoorEntry',
  33: '1,700',
  34: '± SF',
  35: 'Size',
  36: 'LargeGlassWindows',
  37: 'ColdDarkShellw/TIAllowance',
  38: '5:1Parking',
  39: 'Upfit'},
 'Top': {0: 6,
  1: 6,
  2: 49,
  3: 103,
  4: 103,
  5: 103,
  6: 590,
  7: 637,
  8: 656,
  9: 676,
  10: 695,
  11: 716,
  12: 716,
  13: 717,
  14: 736,
  15: 755,
  16: 775,
  17: 794,
  18: 813,
  19: 835,
  20: 111,
  21: 138,
  22: 142,
  23: 142,
  24: 169,
  25: 174,
  26: 179,
  27: 190,
  28: 195,
  29: 216,
  30: 217,
  31: 217,
  32: 238,
  33: 247,
  34: 247,
  35: 248,
  36: 259,
  37: 274,
  38: 282,
  39: 285}}

1 个答案:

答案 0 :(得分:2)

您可以编写一个函数来首先对Top列进行分组,然后在该列上使用groupby

import pandas as pd
df = pd.DataFrame({'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
 'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
 'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
 'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}})

def group(l, group_range):
    groups = []
    current_group = []
    i = 0
    group_count = 1
    while i < len(l):
        a = l[i]
        if len(current_group) == 0:
            if i == len(l) - 1:
                break
            current_group_start = a
        if a <= current_group_start + group_range:
            current_group.append(group_count)
        if a < current_group_start + group_range:
            i += 1
        else:
            groups.extend(current_group)
            current_group = []
            group_count += 1
    groups.extend(current_group)
    return groups
#group(df['Top'],20) -> [1, 1, 1, 2, 3, 3]

df['group'] = group(df['Top'],20)
df.groupby(['group'],as_index=False).agg(list)

输出:

    group   ID              Left            Name        Top
0   1       [100, 23, 19]   [70, 70, 70]    [A, B, C]   [10, 26, 26]
1   2       [42]            [70]            [D]         [35]
2   3       [11, 78]        [66, 66]        [E, F]      [60, 71]