我想根据条件将pandas dataframe列分组,如果值在+20范围内。 下面是数据框
{'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}}
在这里,我想对“左”和“上”列进行分组。 这就是我所做的:
df.groupby(['Top'],as_index=False).agg(lambda x: list(x))
这是我得到的结果:
{'Top': {0: 10, 1: 26, 2: 35, 3: 60, 4: 71},
'Name': {0: ['A'], 1: ['B', 'C'], 2: ['D'], 3: ['E'], 4: ['F']},
'ID': {0: [100], 1: [23, 19], 2: [42], 3: [11], 4: [78]},
'Left': {0: [70], 1: [70, 70], 2: [43], 3: [66], 4: [66]}}
所需的输出:
{'Top': {0: [10, 26], 2: 35, 3: [60,71]},
'Name': {0: ['A', 'B', 'C'], 2: ['D'], 3: ['E', 'F']},
'ID': {0: [100, 23, 19], 2: [42], 3: [11, 78]},
'Left': {0: [70, 50, 87], 2: [43], 3: [66, 99]}}
注意:
要考虑的重要事项是Top值10和26在20的范围内,它形成一个组。即使35和26之间的差值在20的范围内,也不应该将35添加到该组中,因为10和20已经在一个组中,而10(该组中的最小值)和35之间的差值不在范围为20。
有没有其他解决方法?
编辑:
我有一个不同的用例,其最高价值会增加,当它移到新页面时,最高价值会发生变化并再次开始增加。对于不同的输入,此操作继续进行。最后,我想按输入文件名,页码和分组进行分组。如何将它们分组?
{'Input File Name': {0: 268441,
1: 268441,
2: 268441,
3: 268441,
4: 268441,
5: 268441,
6: 268441,
7: 268441,
8: 268441,
9: 268441,
10: 268441,
11: 268441,
12: 268441,
13: 268441,
14: 268441,
15: 268441,
16: 268441,
17: 268441,
18: 268441,
19: 268441,
20: 268441,
21: 268441,
22: 268441,
23: 268441,
24: 268441,
25: 268441,
26: 268441,
27: 268441,
28: 268441,
29: 268441,
30: 268441,
31: 268441,
32: 268441,
33: 268441,
34: 268441,
35: 268441,
36: 268441,
37: 268441,
38: 268441,
39: 268441},
'Page Number': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 1,
13: 1,
14: 1,
15: 1,
16: 1,
17: 1,
18: 1,
19: 1,
20: 2,
21: 2,
22: 2,
23: 2,
24: 2,
25: 2,
26: 2,
27: 2,
28: 2,
29: 2,
30: 2,
31: 2,
32: 2,
33: 2,
34: 2,
35: 2,
36: 2,
37: 2,
38: 2,
39: 2},
'Content': {0: '3708 Forestview Road',
1: 'AvailableForLease&Sale',
2: '1,700± SFMedicalOffice',
3: '3708ForestviewRoad',
4: 'Suite107',
5: 'Raleigh,NC27612',
6: 'BuildingDescription',
7: '22,278± SFClassAOfficeBuilding',
8: 'OnlyOneSuiteLeft toLeaseand/orPurchase',
9: '(1)1,700± SFShell',
10: 'FlexibleLeaseTerms',
11: '2Floorsw/Elevator&Stairsto2',
12: 'Level',
13: 'nd',
14: 'ClassAFinishes',
15: 'On-SitePropertyManagement',
16: 'LargeGlass Windows',
17: '5:1Parking',
18: 'Formoreinformation,contact:',
19: 'OtherTenants: PivotPhysicalTherapy,TheLundy',
20: 'LeasingDetails',
21: 'SpaceDescription',
22: 'LeaseRate',
23: 'CompetitiveNNN+$5.50TICAM',
24: 'Tenant',
25: 'Suite107:1,700± SF',
26: 'Janitorial&Electric',
27: 'Responsibilities',
28: 'ShellSpacew/TIAllowance&Architecturals',
29: 'ClassABuilding',
30: 'SalePrice',
31: '$374,000or$220PSF',
32: 'BeautifulDouble-DoorEntry',
33: '1,700',
34: '± SF',
35: 'Size',
36: 'LargeGlassWindows',
37: 'ColdDarkShellw/TIAllowance',
38: '5:1Parking',
39: 'Upfit'},
'Top': {0: 6,
1: 6,
2: 49,
3: 103,
4: 103,
5: 103,
6: 590,
7: 637,
8: 656,
9: 676,
10: 695,
11: 716,
12: 716,
13: 717,
14: 736,
15: 755,
16: 775,
17: 794,
18: 813,
19: 835,
20: 111,
21: 138,
22: 142,
23: 142,
24: 169,
25: 174,
26: 179,
27: 190,
28: 195,
29: 216,
30: 217,
31: 217,
32: 238,
33: 247,
34: 247,
35: 248,
36: 259,
37: 274,
38: 282,
39: 285}}
答案 0 :(得分:2)
您可以编写一个函数来首先对Top
列进行分组,然后在该列上使用groupby
:
import pandas as pd
df = pd.DataFrame({'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}})
def group(l, group_range):
groups = []
current_group = []
i = 0
group_count = 1
while i < len(l):
a = l[i]
if len(current_group) == 0:
if i == len(l) - 1:
break
current_group_start = a
if a <= current_group_start + group_range:
current_group.append(group_count)
if a < current_group_start + group_range:
i += 1
else:
groups.extend(current_group)
current_group = []
group_count += 1
groups.extend(current_group)
return groups
#group(df['Top'],20) -> [1, 1, 1, 2, 3, 3]
df['group'] = group(df['Top'],20)
df.groupby(['group'],as_index=False).agg(list)
输出:
group ID Left Name Top
0 1 [100, 23, 19] [70, 70, 70] [A, B, C] [10, 26, 26]
1 2 [42] [70] [D] [35]
2 3 [11, 78] [66, 66] [E, F] [60, 71]