如何删除DataFrame的特定行以生成嵌套的JSON

时间:2019-05-21 15:32:32

标签: python python-3.x pandas dataframe

我目前正在处理d3树图,该树图需要嵌套的json作为条目,我成功地组织了df并生成了json,但是我的一些treemap矩形比其他矩形大30倍,因此我决定删除生成的行这个矩形。

我的函数dropSmall()会在我的列和行中进行迭代,以验证每个分组依据是否总和小于最大总和的30倍 我正在努力使用下降或影响匹配的值来更新df 这是我的代码:

def dropSmall(df):
    list = []
    for i in df.columns: #b, c, z ..
        if i != 'valeur' and i!='unite':
            list.append(i)
            # iterating on rows
            for j in range(df.groupby(list).sum().shape[0]): 
                myMax = df.groupby(list).sum().iloc[:, 0].max() / 30
                myJ = df.groupby(list).sum().iloc[:, 0][j]
                myDf = df.groupby(list).sum().iloc[:, 0]
                if myJ <= myMax:
                    df = df[myDf['value']>=  myMax]

和我的groupby看起来像这样


          name          b   c   z   l   sL  value       unit
3099    Myindicator     1   1   3   NA  NA  129.74      kg
3100                                    1   44929.74    kg
3101                                    2   5174.74     kg
3110                    3   1   3   1   NA  2497.66     kg
3156                                2   NA  29.43       kg
3222                                3   NA  304.81      kg


对于b = 1 c = 1 z = 3 l = NA时第一个行的示例,我想在3 sL上进行迭代时验证sL的值>此总和的最大值的30倍,对此在value = 129的情况下删除行

我的函数验证了条件,但我不知道如何从我的初始df中删除行,而不是df.groupby('list').sum()

第一行未分组df的示例

        name        Continent  Region   Country   State   City    Borough  Value       Unit
1000    Myindicator     1        1        3        1      1         1      53.86      kg

[从这里编辑]

我的截止乘数是2 每个层次结构都有一个最大值

                                            Value
name        Continent Region Country State       
Myindicator 1         1      1       7         50[MAX]
                                     8         30 
                             2       5         70[MAX]
                                     6         30 *
                             3       1         50[MAX]
                             4       5        200[MAX]
                                     6        150 
                             5       1        300[MAX]
                                     6        160
                                     7        100*
                                     8         50*
                                     9         50*
                      2      4       9        100[MAX]
                                     10        40 *
                             5       3         80[MAX]
                                     11        20 *
                             6       2         10[MAX]
                      3      7       12       100[MAX]


在此示例中,您不会删除2区6国家2州,因为它是该region> country> state的唯一行,并且同时是最大值

希望这更清楚

1 个答案:

答案 0 :(得分:0)

因此,我不清楚您输入的内容或想要返回的内容100%不清楚,但是如果我理解正确,我认为以下方法会起作用。

在此处编辑

EDIT2 :添加了星号(*),以指示要删除的行。

EDIT3 :由于分配和复制使用pandas.DataFrame

的方式而改变了功能

执行此过程的功能:

def drop_small(dfcop, cutoff_multiplier):
    # Create copy of dataframe so we don't alter the original
    df=dfcop.copy(deep=True)
    # Group on all columns except 'Value' and 'Unit'
    grp_cols = [i for i in df.columns if i not in ['Value', 'Unit']]
    groupers = [grp_cols[:i+1] for i in range(len(grp_cols))]
    print(groupers)
    #loop through all hierarchical groupings
    for grp in groupers:
        print(f"Grouping on {grp}")
        # Add a column with the group sums to the dataframe
        df['gsum'] = df.groupby(grp)['Value'].transform('sum')
        # Compute the max of the parent group - don't do this if we are grouping by a single field
        if len(grp) > 1:
            df['gmax'] = df.groupby(grp[:-1])['gsum'].transform(lambda x: max(x)/cutoff_multiplier)
        else:
            df['gmax'] = df.gsum.max()/cutoff_multiplier
        print("Grouped sums and cutoffs for this hierarchy:")
        print(df)
        # Drop all rows where the group sum is less than the cutoff mulitplier of the max
        idexs = df[df.gsum < df.gmax].index
        df = df[df.gsum >= df.gmax]
        print('Indexes dropped:')
        print(','.join([str(i) for i in idexs]))
        # Remove the group sum column
        df.drop(['gsum', 'gmax'], axis=1, inplace=True)
    return df

这是示例表的工作方式。

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
4   Myindicator          1       1        2      6     30   kg
5   Myindicator          1       1        1      7     50   kg
6   Myindicator          1       1        1      8     20   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
9   Myindicator          1       2        4     10     40   kg
10  Myindicator          1       2        5     11     20   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
13  Myindicator          1       2        6      2     10   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg
16  Myindicator          1       3        8     14     15   kg
17  Myindicator          1       3        8     14     15   kg
18  Myindicator          1       3        8     13     15   kg
19  Myindicator          1       3        8     13      1   kg
20  Myindicator          1       4        9     15     10   kg
21  Myindicator          1       4        9     16     10   kg

['name']上分组 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

下降的索引: 没有

['name', 'Continent']上分组 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   686   343
1   Myindicator          1       1        3      4     50   kg   686   343
2   Myindicator          1       1        2      5     20   kg   686   343
3   Myindicator          1       1        2      5     50   kg   686   343
4   Myindicator          1       1        2      6     30   kg   686   343
5   Myindicator          1       1        1      7     50   kg   686   343
6   Myindicator          1       1        1      8     20   kg   686   343
7   Myindicator          1       2        4      9     50   kg   686   343
8   Myindicator          1       2        4      9     50   kg   686   343
9   Myindicator          1       2        4     10     40   kg   686   343
10  Myindicator          1       2        5     11     20   kg   686   343
11  Myindicator          1       2        5      3     40   kg   686   343
12  Myindicator          1       2        5      3     40   kg   686   343
13  Myindicator          1       2        6      2     10   kg   686   343
14  Myindicator          1       3        7     12     50   kg   686   343
15  Myindicator          1       3        7     12     50   kg   686   343
16  Myindicator          1       3        8     14     15   kg   686   343
17  Myindicator          1       3        8     14     15   kg   686   343
18  Myindicator          1       3        8     13     15   kg   686   343
19  Myindicator          1       3        8     13      1   kg   686   343
20  Myindicator          1       4        9     15     10   kg   686   343
21  Myindicator          1       4        9     16     10   kg   686   343

下降的索引: 没有

['name', 'Continent', 'Region']上分组 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   270   135
1   Myindicator          1       1        3      4     50   kg   270   135
2   Myindicator          1       1        2      5     20   kg   270   135
3   Myindicator          1       1        2      5     50   kg   270   135
4   Myindicator          1       1        2      6     30   kg   270   135
5   Myindicator          1       1        1      7     50   kg   270   135
6   Myindicator          1       1        1      8     20   kg   270   135
7   Myindicator          1       2        4      9     50   kg   250   135
8   Myindicator          1       2        4      9     50   kg   250   135
9   Myindicator          1       2        4     10     40   kg   250   135
10  Myindicator          1       2        5     11     20   kg   250   135
11  Myindicator          1       2        5      3     40   kg   250   135
12  Myindicator          1       2        5      3     40   kg   250   135
13  Myindicator          1       2        6      2     10   kg   250   135
14  Myindicator          1       3        7     12     50   kg   146   135
15  Myindicator          1       3        7     12     50   kg   146   135
16  Myindicator          1       3        8     14     15   kg   146   135
17  Myindicator          1       3        8     14     15   kg   146   135
18  Myindicator          1       3        8     13     15   kg   146   135
19  Myindicator          1       3        8     13      1   kg   146   135
20  Myindicator          1       4        9     15     10   kg    20   135 *
21  Myindicator          1       4        9     16     10   kg    20   135 *

下降的索引: 20,21

['name', 'Continent', 'Region', 'Country']上分组 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg   100    50
1   Myindicator          1       1        3      4     50   kg   100    50
2   Myindicator          1       1        2      5     20   kg   100    50
3   Myindicator          1       1        2      5     50   kg   100    50
4   Myindicator          1       1        2      6     30   kg   100    50
5   Myindicator          1       1        1      7     50   kg    70    50
6   Myindicator          1       1        1      8     20   kg    70    50
7   Myindicator          1       2        4      9     50   kg   140    70
8   Myindicator          1       2        4      9     50   kg   140    70
9   Myindicator          1       2        4     10     40   kg   140    70
10  Myindicator          1       2        5     11     20   kg   100    70
11  Myindicator          1       2        5      3     40   kg   100    70
12  Myindicator          1       2        5      3     40   kg   100    70
13  Myindicator          1       2        6      2     10   kg    10    70 *
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50
16  Myindicator          1       3        8     14     15   kg    46    50 *
17  Myindicator          1       3        8     14     15   kg    46    50 *
18  Myindicator          1       3        8     13     15   kg    46    50 *
19  Myindicator          1       3        8     13      1   kg    46    50 *

下降的索引: 13,16,17,18,19

['name', 'Continent', 'Region', 'Country', 'State']上分组 此层次结构的分组总和和截止值:

           name  Continent  Region  Country  State  Value Unit  gsum  gmax
0   Myindicator          1       1        3      1     50   kg    50    25
1   Myindicator          1       1        3      4     50   kg    50    25
2   Myindicator          1       1        2      5     20   kg    70    35
3   Myindicator          1       1        2      5     50   kg    70    35
4   Myindicator          1       1        2      6     30   kg    30    35 *
5   Myindicator          1       1        1      7     50   kg    50    25
6   Myindicator          1       1        1      8     20   kg    20    25 *
7   Myindicator          1       2        4      9     50   kg   100    50
8   Myindicator          1       2        4      9     50   kg   100    50
9   Myindicator          1       2        4     10     40   kg    40    50 *
10  Myindicator          1       2        5     11     20   kg    20    40 *
11  Myindicator          1       2        5      3     40   kg    80    40
12  Myindicator          1       2        5      3     40   kg    80    40
14  Myindicator          1       3        7     12     50   kg   100    50
15  Myindicator          1       3        7     12     50   kg   100    50

下降的索引: 4,6,9,10

最终表:

           name  Continent  Region  Country  State  Value Unit
0   Myindicator          1       1        3      1     50   kg
1   Myindicator          1       1        3      4     50   kg
2   Myindicator          1       1        2      5     20   kg
3   Myindicator          1       1        2      5     50   kg
5   Myindicator          1       1        1      7     50   kg
7   Myindicator          1       2        4      9     50   kg
8   Myindicator          1       2        4      9     50   kg
11  Myindicator          1       2        5      3     40   kg
12  Myindicator          1       2        5      3     40   kg
14  Myindicator          1       3        7     12     50   kg
15  Myindicator          1       3        7     12     50   kg