我在pandas python中有一个数据框如下 data frame
<table style="width:100%">
<tr>
<th>ID</th>
<th>AGE</th>
<th>GENDER</th>
<th>TIME</th>
<th>CODE</th>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>2</td>
<td>0</td>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>5</td>
<td>1</td>
</tr>
</table>
1 66 M 1 1 1 66 M 2 1 1 66 M 3 1 2 20 F 1 0 2 20 F 2 0 2 20 F 3 0 2 20 F 4 0 3 18 F 1 1 3 18 F 2 1 3 18 F 3 1 3 18 F 4 1
我需要根据以下内容更改最后一列(无论“CODE”列为1,将该ID的最后一行保留为1并将之前的行更改为零)
<table style="width:100%">
<tr>
<th>ID</th>
<th>AGE</th>
<th>GENDER</th>
<th>TIME</th>
<th>CODE</th>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>66</td>
<td>M</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>2</td>
<td>0</td>
<tr>
<td>2</td>
<td>20</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>4</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>18</td>
<td>F</td>
<td>5</td>
<td>1</td>
</tr>
</table>
如何使用pandas做到这一点?
查找后,我发现这行代码删除了每组的最后一行 dfnew =(df.groupby('ID')。apply(lambda x:x.iloc [: - 1] if len(x)&gt; 1 else x))
提前致谢
答案 0 :(得分:1)
按1
过滤获取索引,并按drop_duplicates
按ID
删除欺骗:
i = df[df['CODE'] == 1].drop_duplicates(subset=['ID'], keep='last').index
首先将列设置为0
,然后按i
:
df['CODE'] = 0
df.loc[i, 'CODE'] = 1
另一个解决方案是创建布尔掩码并将其转换为int
s:
m = (df['CODE'] == 1) & ~df['ID'].duplicated(keep='last')
print (m)
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 True
dtype: bool
df['CODE'] = m.astype(int)
print (df)
ID AGE GENDER TIME CODE
0 1 66 M 1 0
1 1 66 M 2 0
2 1 66 M 3 1
3 2 20 F 1 0
4 2 20 F 2 0
5 2 20 F 3 0
6 2 20 F 4 0
7 3 18 F 1 0
8 3 18 F 2 0
9 3 18 F 3 0
10 3 18 F 4 1