使用python将数据框中特定列中特定值的所有列中的值更改为None

时间:2017-07-23 12:45:27

标签: python pandas dataframe

我有一个如下数据框:

time                 c1        c2
1 2017-07-23 11:39:10  3.385661  3.193302
2 2017-07-23 11:39:20  3.157000  2.912690
3 2017-07-23 11:39:30  3.277145  3.124290
4 2017-07-23 11:39:40  3.126075  2.982679
5 2017-07-23 11:39:50  3.135766  2.985840
6 2017-07-23 11:40:00  3.166134  3.016147
7 2017-07-23 11:40:10  2.487507  2.256214
8 2017-07-23 11:40:20  3.348368  3.158728
9 2017-07-23 11:40:30  3.219001  2.996357
10 2017-07-23 11:40:40  2.862558  2.711170
11 2017-07-23 11:40:50  2.558438  2.346303
12 2017-07-23 11:41:00  3.338989  3.192018
13 2017-07-23 11:41:10  2.674149  2.496557
14 2017-07-23 11:41:20  3.523231  3.315889
15 2017-07-23 11:41:30  2.931527  2.740840
16 2017-07-23 11:41:40  3.078464  2.938004

问题1:如果它落在时间列的两个特定时间范围之间,我想将C1和c2中的值设为无。

对于问题1:我尝试做的是,获取落在两个特定时间范围之间的所有行的索引,然后更改值:

index_list = df.ds[(df.ds >= start_time) & (df.ds <= end_time)].index.tolist()

问题1我解决了使用:

start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'

df.loc[(df['ds'] >= start_time) & (df['ds'] <= end_time), df.columns!= 'ds'] = None

请帮我解决问题2

现在,如何使用此索引列表在除时间列之外的所有列中将值更改为“无”。

问题2:此外,如果3.38或任何特定数字出现在任何列(时间列除外)中,我想将该值设为无。做这些事情的方法是什么?请建议。我很难做到这一点。感谢

2 个答案:

答案 0 :(得分:0)

您可以将boolean indexingloc一起用于NaNreplace的更改值,但实际上它有问题,请同时查看this

#convert if not datetime
#df['time'] = pd.to_datetime(df['time'])

start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
#select only column c1, c2
df.loc[(df.time >= start_time) & (df.time <= end_time), ['c1','c2']] = np.nan

#check floats
print (df.iloc[0].tolist())
[Timestamp('2017-07-23 11:39:10'), 3.3856610000000003, 3.1933020000000001]


df = df.replace(3.3856610000000003,np.nan)
print (df)
                  time        c1        c2
1  2017-07-23 11:39:10       NaN  3.193302
2  2017-07-23 11:39:20  3.157000  2.912690
3  2017-07-23 11:39:30  3.277145  3.124290
4  2017-07-23 11:39:40  3.126075  2.982679
5  2017-07-23 11:39:50  3.135766  2.985840
6  2017-07-23 11:40:00  3.166134  3.016147
7  2017-07-23 11:40:10  2.487507  2.256214
8  2017-07-23 11:40:20       NaN       NaN
9  2017-07-23 11:40:30       NaN       NaN
10 2017-07-23 11:40:40       NaN       NaN
11 2017-07-23 11:40:50       NaN       NaN
12 2017-07-23 11:41:00  3.338989  3.192018
13 2017-07-23 11:41:10  2.674149  2.496557
14 2017-07-23 11:41:20  3.523231  3.315889
15 2017-07-23 11:41:30  2.931527  2.740840
16 2017-07-23 11:41:40  3.078464  2.938004

difference针对没有timenp.isclose功能的所有列的解决方案:

start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
cols = df.columns.difference(['time'])

df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), cols] = None

df[cols] = df[cols].mask(np.isclose(df[cols].values, 3.38566), None)
print (df)
                   time       c1       c2
1   2017-07-23 11:39:10     None   3.1933
2   2017-07-23 11:39:20    3.157  2.91269
3   2017-07-23 11:39:30  3.27714  3.12429
4   2017-07-23 11:39:40  3.12608  2.98268
5   2017-07-23 11:39:50  3.13577  2.98584
6   2017-07-23 11:40:00  3.16613  3.01615
7   2017-07-23 11:40:10  2.48751  2.25621
8   2017-07-23 11:40:20      NaN      NaN
9   2017-07-23 11:40:30      NaN      NaN
10  2017-07-23 11:40:40      NaN      NaN
11  2017-07-23 11:40:50      NaN      NaN
12  2017-07-23 11:41:00  3.33899  3.19202
13  2017-07-23 11:41:10  2.67415  2.49656
14  2017-07-23 11:41:20  3.52323  3.31589
15  2017-07-23 11:41:30  2.93153  2.74084
16  2017-07-23 11:41:40  3.07846    2.938

您可以DatetimeIndex使用set_index,然后按loc选择行并设置NaN

替换float值有点问题,因为精度。因此,请numpy.isclosemask联系,以None替换boolean mask

#if necessary convert to datetime
#df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')

df.loc['2017-07-23 11:39:20':'2017-07-23 11:39:50'] = np.nan
df.loc['2017-07-23 11:40:20':'2017-07-23 11:40:50'] = np.nan
df = df.mask(np.isclose(df.values, 3.38566))
print (df)
                           c1        c2
time                                   
2017-07-23 11:39:10       NaN  3.193302
2017-07-23 11:39:20       NaN       NaN
2017-07-23 11:39:30       NaN       NaN
2017-07-23 11:39:40       NaN       NaN
2017-07-23 11:39:50       NaN       NaN
2017-07-23 11:40:00  3.166134  3.016147
2017-07-23 11:40:10  2.487507  2.256214
2017-07-23 11:40:20       NaN       NaN
2017-07-23 11:40:30       NaN       NaN
2017-07-23 11:40:40       NaN       NaN
2017-07-23 11:40:50       NaN       NaN
2017-07-23 11:41:00  3.338989  3.192018
2017-07-23 11:41:10  2.674149  2.496557
2017-07-23 11:41:20  3.523231  3.315889
2017-07-23 11:41:30  2.931527  2.740840
2017-07-23 11:41:40  3.078464  2.938004

答案 1 :(得分:0)

问题1我解决了使用:

start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'

df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), df.columns!= 'time'] = None