大熊猫用组中的列值标识第一行

时间:2019-06-21 19:08:56

标签: python pandas

我有一个包含三列的数据框:

header {
	background-color: #fff;
	text-align: center;
	width: 100%;
    z-index: 2;
}
nav {
    background-color: #eee;
	position: sticky;
	top: 0;
	width: 100%;
    z-index: 2;
}
main {
    max-width: 80ch;
    margin: auto;
}
article {
    margin-top: 40px;
    padding: 20px;
    background-color: #fff;
}
.flexRow {
	display: flex;
	flex-wrap: wrap;
	flex-direction: row;
	justify-content: center;
}
body {
    margin: 0;
    background-image: url();
}

对于每个ID,如果状态为“重新打开”,则需要获取显示基于日期的第一次“重新打开”的行。所以我的输出看起来像:

    <body>
        <header>
        	<h1>All posts</h1>
        	<p>That's it</p>
        </header>
    
        <nav class="flexRow">
        	<a href="/home" style="order: -2;">Home </a> | 
            <a href="/blog"> Blog </a> | 
            <a href="/new"> New </a>
        </nav>  
        
    	<main>
    	    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
    		<article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
    		<article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
		    <article>
	            <h2>This is a post</h2>
	            <hr>
	            <p>This is a preview of the post ...</p>
		    </article>
    	</main>
    </body>

我尝试过: ID Date Status 0 1 1/1/2000 Complete 1 1 1/4/2000 ReOpened 2 1 1/10/2000 ReOpened 3 1 1/11/2000 Closed 4 1 1/15/2000 ReOpened 5 2 1/2/2000 ReOpened 6 2 1/4/2000 ReOpened 7 2 1/10/2000 Closed 8 3 1/20/2000 Closed 9 3 1/22/2000 Closed 10 4 1/25/2000 ReOpened ,但这不起作用。

2 个答案:

答案 0 :(得分:1)

使用groupbycumsum遮罩进行此操作:

df[df['Status'].eq('ReOpened').groupby(df['ID']).cumsum() == 1] 

    ID       Date    Status
1    1   1/4/2000  ReOpened
5    2   1/2/2000  ReOpened
10   4  1/25/2000  ReOpened 

您还可以在过滤后使用groupbyfirst仅获得第一行:

df[df['Status'].eq('ReOpened')].groupby('ID', as_index=False).first()  

   ID       Date    Status
0   1   1/4/2000  ReOpened
1   2   1/2/2000  ReOpened
2   4  1/25/2000  ReOpened

如果性能很重要,则可以使用eqduplicated将以上内容简化为单个布尔索引操作:

df[df['Status'].eq('ReOpened') & ~df.duplicated(['ID', 'Status'])] 

    ID       Date    Status
1    1   1/4/2000  ReOpened
5    2   1/2/2000  ReOpened
10   4  1/25/2000  ReOpened

答案 1 :(得分:1)

drop_duplicates应该足够了。

df[df.Status.eq('ReOpened')].drop_duplicates(['ID'])                                                                       
#    ID       Date    Status
#1    1   1/4/2000  ReOpened
#5    2   1/2/2000  ReOpened
#10   4  1/25/2000  ReOpened