我有一个CSV文件,如下所示:
Timestamp Status
1501 Normal
1501 Normal
1502 Delay
1503 Received
1504 Normal
1504 Delay
1505 Received
1506 Received
1507 Delay
1507 Received
我想向数据框添加一个新的“ Notif”列,该列显示为计数器变量,并且遇到“状态”列中的“已接收”值时具有增量。 我希望输出看起来像:
Timestamp Status Notif
1501 Normal N0
1501 Normal N0
1502 Delay N0
1503 Received N1
1504 Normal N1
1504 Delay N1
1505 Received N2
1506 Received N3
1507 Delay N3
1507 Received N4
我尝试搜索此问题的解决方案,并且各种资料来源都建议使用arcpy软件包进行编码,但由于PyCharm似乎不支持arcpy软件包,因此我想在没有它的情况下进行这项工作
还尝试使用numpy用作条件运算符,但这似乎不起作用
答案 0 :(得分:1)
使用df.iterrows
遍历行可以实现以下目的:
df['Notif'] = None
counter = 0
for idx, row in df.iterrows():
if df.iloc[idx, 1] == "Received":
counter +=1
df.iloc[idx,-1] = "N" + str(counter)
print(df)
输出
+----+------------+-----------+-------+
| | Timestamp | Status | Notif |
+----+------------+-----------+-------+
| 0 | 1501 | Normal | N0 |
| 1 | 1501 | Normal | N0 |
| 2 | 1502 | Delay | N0 |
| 3 | 1503 | Received | N1 |
| 4 | 1504 | Normal | N1 |
| 5 | 1504 | Delay | N1 |
| 6 | 1505 | Received | N2 |
| 7 | 1506 | Received | N3 |
| 8 | 1507 | Delay | N3 |
| 9 | 1507 | Received | N4 |
+----+------------+-----------+-------+
答案 1 :(得分:0)
混乱永远是你的不得已的手段;它不是矢量化的,而且非常慢。 $ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .
是解决问题的理想工具。唯一的麻烦是您想累加非重复项。 EG:
cumsum()