生成具有每个组的给定列的所有唯一值的行

时间:2018-03-30 14:48:37

标签: pandas pandas-groupby

df = pd.DataFrame({'timePoint': [1,1,1,1,2,2,2,2,3,3,3,3],
                        'item': [1,2,3,4,3,4,5,6,1,3,7,2],
                       'value': [2,4,7,6,5,9,3,2,4,3,1,5]})

>>> df
    item  timePoint  value
0      1          1      2
1      2          1      4
2      3          1      7
3      4          1      6
4      3          2      5
5      4          2      9
6      5          2      3
7      6          2      2
8      1          3      4
9      3          3      3
10     7          3      1
11     2          3      5

在此df中,并非item出现在每个timePoint。我希望每个items都包含所有唯一timePoint,这些新插入的items应该具有:

(i)NaN value如果他们没有出现在之前的timePoint,或者是 (ii)如果有,他们会得到他们最近的value

所需的输出应如下所示(带有#标签的行是插入的那些)。

>>> dfx
    item  timePoint  value
0      1          1    2.0
3      1          2    2.0 #
8      1          3    4.0
1      2          1    4.0
4      2          2    4.0 #
11     2          3    5.0
2      3          1    7.0
4      3          2    5.0
9      3          3    3.0
3      4          1    6.0
5      4          2    9.0
6      4          3    9.0 #
0      5          1    NaN #
6      5          2    3.0 
7      5          3    3.0 #
1      6          1    NaN #
7      6          2    2.0 
8      6          3    2.0 #
2      7          1    NaN #
5      7          2    NaN #
10     7          3    1.0

例如,item 14.0 timePoint获得2,因为它具有timePoint 1item 6NaN timePoint获得1,因为之前没有value

现在,我知道如果我设法在每个item timePoint中插入缺少的每个唯一group的所有行,即达到这一点:

>>> dfx
    item  timePoint  value
0      1          1    2.0
1      2          1    4.0
2      3          1    7.0
3      4          1    6.0
4      3          2    5.0
5      4          2    9.0
6      5          2    3.0
7      6          2    2.0
8      1          3    4.0
9      3          3    3.0
10     7          3    1.0
11     2          3    5.0
0      5          1    NaN
1      6          1    NaN
2      7          1    NaN
3      1          2    NaN
4      2          2    NaN
5      7          2    NaN
6      4          3    NaN
7      5          3    NaN
8      6          3    NaN

然后我可以做:

dfx.sort_values(by = ['item', 'timePoint'],
                                inplace = True,
                                ascending = [True, True])
dfx['value'] = dfx.groupby('item')['value'].fillna(method='ffill')

将返回所需的输出。

但是,如何将每个df.item.unique() items缺少的所有timePoint group添加为行?

此外,如果你有一个更有效的解决方案从头开始建议,那么请务必成为我的客人。

2 个答案:

答案 0 :(得分:2)

我认为<div id="page" class="site"> <a class="skip-link screen-reader-text" href="#content"><?php esc_html_e( 'Skip to content', 'bootstrap2wordpress' ); ?></a> <div class="container-fluid"> <!-- Navigation --> <nav class="navbar navbar-inverse navbar-fixed-top" id="main-nav"> <div class="container"> <div class="navbar-header"> <button aria-controls="navbar" aria-expanded="false" class="navbar-toggle collapsed" data-target="#navbar" data-toggle="collapse" type="button"><span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span></button> <a class="navbar-brand" href="<?php echo get_option("siteurl"); ?>"><img alt="Agnes Burke" src="<?php bloginfo('stylesheet_directory'); ?>/img/logo.svg" width="120px"></a> </div> <?php wp_nav_menu ( array( 'theme_location' => 'primary', 'container' => 'nav', 'container_id' => 'navbar', 'container_class' => 'navbar-collapse collapse', 'menu' => 'Main Menu', 'menu_class' => 'nav navbar-nav' ) ); ?> <!--/.navbar-collapse --> </div> </nav> </div> stack将达到格式,然后我们使用unstack groupby来填充向前的纳米值

ffill

答案 1 :(得分:2)

使用pd.MultiIndex.from_productlevelsreindex

d = df.set_index(['item', 'timePoint'])
d.reindex(
    pd.MultiIndex.from_product(d.index.levels, names=d.index.names)
).groupby(level='item').ffill().reset_index()

    item  timePoint  value
0      1          1    2.0
1      1          2    2.0
2      1          3    4.0
3      2          1    4.0
4      2          2    4.0
5      2          3    5.0
6      3          1    7.0
7      3          2    5.0
8      3          3    3.0
9      4          1    6.0
10     4          2    9.0
11     4          3    9.0
12     5          1    NaN
13     5          2    3.0
14     5          3    3.0
15     6          1    NaN
16     6          2    2.0
17     6          3    2.0
18     7          1    NaN
19     7          2    NaN
20     7          3    1.0