我有pandas DF,在其中我需要遍历两列(位置和事件)中的值,并用NaN替换字符串“ Gate-3”“ NO Access”。
下面是示例DF。
Time Location Event Badge ID
18:28:59 Gate-2 Access Granted 81002
18:28:12 Gate-1 Access Granted 80557
18:27:55 Gate-3 Access Granted 80557
18:27:44 Gate-3 NO Access 80398
18:25:38 Gate-1 NO Access 80978
18:25:30 Gate-2 Access Granted 73680
18:23:56 Gate-1 Access Granted 73680
18:23:52 Gate-2 Access Granted 80557
18:23:19 Gate-2 NO Access 128
18:23:16 Gate-1 Access Granted 80557
预期输出为
Time Location Event Badge ID
0 18:28:59 Gate-2 Access Granted 81002
1 18:28:12 Gate-1 Access Granted 80557
2 18:27:55 NaN Access Granted 80557
3 18:27:44 NaN NaN 80398
4 18:25:38 Gate-1 NaN 80978
5 18:25:30 Gate-2 Access Granted 73680
6 18:23:56 Gate-1 Access Granted 73680
7 18:23:52 Gate-2 Access Granted 80557
8 18:23:19 Gate-2 NaN 128
9 18:23:16 Gate-1 Access Granted 80557
答案 0 :(得分:2)
您可以在加载XLS文件时通过指定na_values
参数进行设置。
df = pd.read_excel('file.xls', na_values=['Gate-3', 'NO Access'])
print(df)
Time Location Event Badge ID
0 18:28:59 Gate-2 Access Granted 81002
1 18:28:12 Gate-1 Access Granted 80557
2 18:27:55 NaN Access Granted 80557
3 18:27:44 NaN NaN 80398
4 18:25:38 Gate-1 NaN 80978
5 18:25:30 Gate-2 Access Granted 73680
6 18:23:56 Gate-1 Access Granted 73680
7 18:23:52 Gate-2 Access Granted 80557
8 18:23:19 Gate-2 NaN 128
9 18:23:16 Gate-1 Access Granted 80557
与IMO相比,这比在 加载数据后清理数据要好。
答案 1 :(得分:2)
您可以在条件满足的情况下获得布尔掩码
mask = df.Location.eq('Gate-3') & df.Event.eq('NO Access') # df is your dataframe
您可以使用该掩码设置NaN
所需的任何列,如下所示:
df.loc[mask, ['Location', 'Event']] = np.nan # imported numpy as np
编辑:
似乎您已经更改了规格。如果要将NaN
设置为“位置或事件”列与您的前哨值匹配的地方,请使用两个掩码。
locmask = df.Location.eq('Gate-3')
df.loc[locmask, 'Location'] = np.nan
evmask = df.Event.eq('NO Access')
df.loc[evmask, 'Event'] = np.nan
答案 2 :(得分:1)
如果我没有误解您的问题,那怎么办?
import pandas as pd
import numpy as np
df.loc[df.Location == 'Gate-3', 'Location'] = np.nan
df.loc[df.Event == 'NO Access', 'Event'] = np.nan
答案 3 :(得分:0)
不必根据条件设置列值进行迭代。相反,您将使用布尔索引。
示例:
data = {'Time':['18:28:59', '18:28:59', '18:28:59'],
'Location':['Gate-2', 'Gate-3', 'Gate-1', ],
'Event':['Access Granted', 'NO Access', 'NO Access'],
'BadgeID':[81002, 80557, 80557]}
df = pd.DataFrame(data)
Time Location Event BadgeID
0 18:28:59 Gate-2 Access Granted 81002
1 18:28:59 Gate-3 NO Access 80557
2 18:28:59 Gate-1 NO Access 80557
“ loc”方法是基于标签的索引器,它接受布尔数组以及其他选项。
条件表达式:
df.Location == 'Gate-3'
返回布尔数组或Series
0 False
1 True
2 False
Name: Location, dtype: bool
您可以使用内置函数type()进行检查
type(df.Location == 'Gate-3')
# pandas.core.series.Series
该系列用作原始DataFrame的loc方法的行索引。
loc方法采用行索引器和列索引器。 所以下面的声明
df.loc[df.Location == 'Gate-3', 'Location'] = np.nan
翻译为:
将“位置”为Gate-3的行和“位置”列的交点设置为空值