在python数据框的另一列中发现重复值时,如何增加新列的值?

时间:2019-06-21 21:59:09

标签: python pandas dataframe pycharm

我有一个CSV文件,如下所示:

Timestamp  Status
1501       Normal
1501       Normal
1502       Delay
1503       Received
1504       Normal
1504       Delay
1505       Received
1506       Received
1507       Delay
1507       Received

我想向数据框添加一个新的“ Notif”列,该列显示为计数器变量,并且遇到“状态”列中的“已接收”值时具有增量。 我希望输出看起来像:

Timestamp  Status     Notif
1501       Normal     N0
1501       Normal     N0
1502       Delay      N0
1503       Received   N1
1504       Normal     N1
1504       Delay      N1
1505       Received   N2
1506       Received   N3
1507       Delay      N3
1507       Received   N4

我尝试搜索此问题的解决方案,并且各种资料来源都建议使用arcpy软件包进行编码,但由于PyCharm似乎不支持arcpy软件包,因此我想在没有它的情况下进行这项工作

还尝试使用numpy用作条件运算符,但这似乎不起作用

2 个答案:

答案 0 :(得分:1)

使用df.iterrows遍历行可以实现以下目的:

df['Notif'] = None
counter = 0
for idx, row in df.iterrows():
    if df.iloc[idx, 1] == "Received":
        counter +=1
    df.iloc[idx,-1] = "N" + str(counter)

print(df)

输出

+----+------------+-----------+-------+
|    | Timestamp  |  Status   | Notif |
+----+------------+-----------+-------+
| 0  |      1501  | Normal    | N0    |
| 1  |      1501  | Normal    | N0    |
| 2  |      1502  | Delay     | N0    |
| 3  |      1503  | Received  | N1    |
| 4  |      1504  | Normal    | N1    |
| 5  |      1504  | Delay     | N1    |
| 6  |      1505  | Received  | N2    |
| 7  |      1506  | Received  | N3    |
| 8  |      1507  | Delay     | N3    |
| 9  |      1507  | Received  | N4    |
+----+------------+-----------+-------+

答案 1 :(得分:0)

混乱永远是你的不得已的手段;它不是矢量化的,而且非常慢。 $ git clone https://github.com/facebookresearch/fastText.git $ cd fastText $ pip install . 是解决问题的理想工具。唯一的麻烦是您想累加非重复项。 EG:

cumsum()