我目前正在处理d3树图,该树图需要嵌套的json作为条目,我成功地组织了df并生成了json,但是我的一些treemap矩形比其他矩形大30倍,因此我决定删除生成的行这个矩形。
我的函数dropSmall()
会在我的列和行中进行迭代,以验证每个分组依据是否总和小于最大总和的30倍
我正在努力使用下降或影响匹配的值来更新df
这是我的代码:
def dropSmall(df):
list = []
for i in df.columns: #b, c, z ..
if i != 'valeur' and i!='unite':
list.append(i)
# iterating on rows
for j in range(df.groupby(list).sum().shape[0]):
myMax = df.groupby(list).sum().iloc[:, 0].max() / 30
myJ = df.groupby(list).sum().iloc[:, 0][j]
myDf = df.groupby(list).sum().iloc[:, 0]
if myJ <= myMax:
df = df[myDf['value']>= myMax]
和我的groupby看起来像这样
name b c z l sL value unit
3099 Myindicator 1 1 3 NA NA 129.74 kg
3100 1 44929.74 kg
3101 2 5174.74 kg
3110 3 1 3 1 NA 2497.66 kg
3156 2 NA 29.43 kg
3222 3 NA 304.81 kg
对于b = 1 c = 1 z = 3 l = NA时第一个行的示例,我想在3 sL上进行迭代时验证sL的值>此总和的最大值的30倍,对此在value = 129的情况下删除行
我的函数验证了条件,但我不知道如何从我的初始df中删除行,而不是df.groupby('list').sum()
第一行未分组df的示例
name Continent Region Country State City Borough Value Unit
1000 Myindicator 1 1 3 1 1 1 53.86 kg
[从这里编辑]
我的截止乘数是2 每个层次结构都有一个最大值
Value
name Continent Region Country State
Myindicator 1 1 1 7 50[MAX]
8 30
2 5 70[MAX]
6 30 *
3 1 50[MAX]
4 5 200[MAX]
6 150
5 1 300[MAX]
6 160
7 100*
8 50*
9 50*
2 4 9 100[MAX]
10 40 *
5 3 80[MAX]
11 20 *
6 2 10[MAX]
3 7 12 100[MAX]
在此示例中,您不会删除2区6国家2州,因为它是该region> country> state的唯一行,并且同时是最大值
希望这更清楚
答案 0 :(得分:0)
因此,我不清楚您输入的内容或想要返回的内容100%不清楚,但是如果我理解正确,我认为以下方法会起作用。
在此处编辑
EDIT2 :添加了星号(*
),以指示要删除的行。
EDIT3 :由于分配和复制使用pandas.DataFrame
执行此过程的功能:
def drop_small(dfcop, cutoff_multiplier):
# Create copy of dataframe so we don't alter the original
df=dfcop.copy(deep=True)
# Group on all columns except 'Value' and 'Unit'
grp_cols = [i for i in df.columns if i not in ['Value', 'Unit']]
groupers = [grp_cols[:i+1] for i in range(len(grp_cols))]
print(groupers)
#loop through all hierarchical groupings
for grp in groupers:
print(f"Grouping on {grp}")
# Add a column with the group sums to the dataframe
df['gsum'] = df.groupby(grp)['Value'].transform('sum')
# Compute the max of the parent group - don't do this if we are grouping by a single field
if len(grp) > 1:
df['gmax'] = df.groupby(grp[:-1])['gsum'].transform(lambda x: max(x)/cutoff_multiplier)
else:
df['gmax'] = df.gsum.max()/cutoff_multiplier
print("Grouped sums and cutoffs for this hierarchy:")
print(df)
# Drop all rows where the group sum is less than the cutoff mulitplier of the max
idexs = df[df.gsum < df.gmax].index
df = df[df.gsum >= df.gmax]
print('Indexes dropped:')
print(','.join([str(i) for i in idexs]))
# Remove the group sum column
df.drop(['gsum', 'gmax'], axis=1, inplace=True)
return df
这是示例表的工作方式。
name Continent Region Country State Value Unit
0 Myindicator 1 1 3 1 50 kg
1 Myindicator 1 1 3 4 50 kg
2 Myindicator 1 1 2 5 20 kg
3 Myindicator 1 1 2 5 50 kg
4 Myindicator 1 1 2 6 30 kg
5 Myindicator 1 1 1 7 50 kg
6 Myindicator 1 1 1 8 20 kg
7 Myindicator 1 2 4 9 50 kg
8 Myindicator 1 2 4 9 50 kg
9 Myindicator 1 2 4 10 40 kg
10 Myindicator 1 2 5 11 20 kg
11 Myindicator 1 2 5 3 40 kg
12 Myindicator 1 2 5 3 40 kg
13 Myindicator 1 2 6 2 10 kg
14 Myindicator 1 3 7 12 50 kg
15 Myindicator 1 3 7 12 50 kg
16 Myindicator 1 3 8 14 15 kg
17 Myindicator 1 3 8 14 15 kg
18 Myindicator 1 3 8 13 15 kg
19 Myindicator 1 3 8 13 1 kg
20 Myindicator 1 4 9 15 10 kg
21 Myindicator 1 4 9 16 10 kg
在['name']
上分组
此层次结构的分组总和和截止值:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 686 343
1 Myindicator 1 1 3 4 50 kg 686 343
2 Myindicator 1 1 2 5 20 kg 686 343
3 Myindicator 1 1 2 5 50 kg 686 343
4 Myindicator 1 1 2 6 30 kg 686 343
5 Myindicator 1 1 1 7 50 kg 686 343
6 Myindicator 1 1 1 8 20 kg 686 343
7 Myindicator 1 2 4 9 50 kg 686 343
8 Myindicator 1 2 4 9 50 kg 686 343
9 Myindicator 1 2 4 10 40 kg 686 343
10 Myindicator 1 2 5 11 20 kg 686 343
11 Myindicator 1 2 5 3 40 kg 686 343
12 Myindicator 1 2 5 3 40 kg 686 343
13 Myindicator 1 2 6 2 10 kg 686 343
14 Myindicator 1 3 7 12 50 kg 686 343
15 Myindicator 1 3 7 12 50 kg 686 343
16 Myindicator 1 3 8 14 15 kg 686 343
17 Myindicator 1 3 8 14 15 kg 686 343
18 Myindicator 1 3 8 13 15 kg 686 343
19 Myindicator 1 3 8 13 1 kg 686 343
20 Myindicator 1 4 9 15 10 kg 686 343
21 Myindicator 1 4 9 16 10 kg 686 343
下降的索引: 没有
在['name', 'Continent']
上分组
此层次结构的分组总和和截止值:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 686 343
1 Myindicator 1 1 3 4 50 kg 686 343
2 Myindicator 1 1 2 5 20 kg 686 343
3 Myindicator 1 1 2 5 50 kg 686 343
4 Myindicator 1 1 2 6 30 kg 686 343
5 Myindicator 1 1 1 7 50 kg 686 343
6 Myindicator 1 1 1 8 20 kg 686 343
7 Myindicator 1 2 4 9 50 kg 686 343
8 Myindicator 1 2 4 9 50 kg 686 343
9 Myindicator 1 2 4 10 40 kg 686 343
10 Myindicator 1 2 5 11 20 kg 686 343
11 Myindicator 1 2 5 3 40 kg 686 343
12 Myindicator 1 2 5 3 40 kg 686 343
13 Myindicator 1 2 6 2 10 kg 686 343
14 Myindicator 1 3 7 12 50 kg 686 343
15 Myindicator 1 3 7 12 50 kg 686 343
16 Myindicator 1 3 8 14 15 kg 686 343
17 Myindicator 1 3 8 14 15 kg 686 343
18 Myindicator 1 3 8 13 15 kg 686 343
19 Myindicator 1 3 8 13 1 kg 686 343
20 Myindicator 1 4 9 15 10 kg 686 343
21 Myindicator 1 4 9 16 10 kg 686 343
下降的索引: 没有
在['name', 'Continent', 'Region']
上分组
此层次结构的分组总和和截止值:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 270 135
1 Myindicator 1 1 3 4 50 kg 270 135
2 Myindicator 1 1 2 5 20 kg 270 135
3 Myindicator 1 1 2 5 50 kg 270 135
4 Myindicator 1 1 2 6 30 kg 270 135
5 Myindicator 1 1 1 7 50 kg 270 135
6 Myindicator 1 1 1 8 20 kg 270 135
7 Myindicator 1 2 4 9 50 kg 250 135
8 Myindicator 1 2 4 9 50 kg 250 135
9 Myindicator 1 2 4 10 40 kg 250 135
10 Myindicator 1 2 5 11 20 kg 250 135
11 Myindicator 1 2 5 3 40 kg 250 135
12 Myindicator 1 2 5 3 40 kg 250 135
13 Myindicator 1 2 6 2 10 kg 250 135
14 Myindicator 1 3 7 12 50 kg 146 135
15 Myindicator 1 3 7 12 50 kg 146 135
16 Myindicator 1 3 8 14 15 kg 146 135
17 Myindicator 1 3 8 14 15 kg 146 135
18 Myindicator 1 3 8 13 15 kg 146 135
19 Myindicator 1 3 8 13 1 kg 146 135
20 Myindicator 1 4 9 15 10 kg 20 135 *
21 Myindicator 1 4 9 16 10 kg 20 135 *
下降的索引: 20,21
在['name', 'Continent', 'Region', 'Country']
上分组
此层次结构的分组总和和截止值:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 100 50
1 Myindicator 1 1 3 4 50 kg 100 50
2 Myindicator 1 1 2 5 20 kg 100 50
3 Myindicator 1 1 2 5 50 kg 100 50
4 Myindicator 1 1 2 6 30 kg 100 50
5 Myindicator 1 1 1 7 50 kg 70 50
6 Myindicator 1 1 1 8 20 kg 70 50
7 Myindicator 1 2 4 9 50 kg 140 70
8 Myindicator 1 2 4 9 50 kg 140 70
9 Myindicator 1 2 4 10 40 kg 140 70
10 Myindicator 1 2 5 11 20 kg 100 70
11 Myindicator 1 2 5 3 40 kg 100 70
12 Myindicator 1 2 5 3 40 kg 100 70
13 Myindicator 1 2 6 2 10 kg 10 70 *
14 Myindicator 1 3 7 12 50 kg 100 50
15 Myindicator 1 3 7 12 50 kg 100 50
16 Myindicator 1 3 8 14 15 kg 46 50 *
17 Myindicator 1 3 8 14 15 kg 46 50 *
18 Myindicator 1 3 8 13 15 kg 46 50 *
19 Myindicator 1 3 8 13 1 kg 46 50 *
下降的索引: 13,16,17,18,19
在['name', 'Continent', 'Region', 'Country', 'State']
上分组
此层次结构的分组总和和截止值:
name Continent Region Country State Value Unit gsum gmax
0 Myindicator 1 1 3 1 50 kg 50 25
1 Myindicator 1 1 3 4 50 kg 50 25
2 Myindicator 1 1 2 5 20 kg 70 35
3 Myindicator 1 1 2 5 50 kg 70 35
4 Myindicator 1 1 2 6 30 kg 30 35 *
5 Myindicator 1 1 1 7 50 kg 50 25
6 Myindicator 1 1 1 8 20 kg 20 25 *
7 Myindicator 1 2 4 9 50 kg 100 50
8 Myindicator 1 2 4 9 50 kg 100 50
9 Myindicator 1 2 4 10 40 kg 40 50 *
10 Myindicator 1 2 5 11 20 kg 20 40 *
11 Myindicator 1 2 5 3 40 kg 80 40
12 Myindicator 1 2 5 3 40 kg 80 40
14 Myindicator 1 3 7 12 50 kg 100 50
15 Myindicator 1 3 7 12 50 kg 100 50
下降的索引: 4,6,9,10
最终表:
name Continent Region Country State Value Unit
0 Myindicator 1 1 3 1 50 kg
1 Myindicator 1 1 3 4 50 kg
2 Myindicator 1 1 2 5 20 kg
3 Myindicator 1 1 2 5 50 kg
5 Myindicator 1 1 1 7 50 kg
7 Myindicator 1 2 4 9 50 kg
8 Myindicator 1 2 4 9 50 kg
11 Myindicator 1 2 5 3 40 kg
12 Myindicator 1 2 5 3 40 kg
14 Myindicator 1 3 7 12 50 kg
15 Myindicator 1 3 7 12 50 kg