我有一个如下数据框:
time c1 c2
1 2017-07-23 11:39:10 3.385661 3.193302
2 2017-07-23 11:39:20 3.157000 2.912690
3 2017-07-23 11:39:30 3.277145 3.124290
4 2017-07-23 11:39:40 3.126075 2.982679
5 2017-07-23 11:39:50 3.135766 2.985840
6 2017-07-23 11:40:00 3.166134 3.016147
7 2017-07-23 11:40:10 2.487507 2.256214
8 2017-07-23 11:40:20 3.348368 3.158728
9 2017-07-23 11:40:30 3.219001 2.996357
10 2017-07-23 11:40:40 2.862558 2.711170
11 2017-07-23 11:40:50 2.558438 2.346303
12 2017-07-23 11:41:00 3.338989 3.192018
13 2017-07-23 11:41:10 2.674149 2.496557
14 2017-07-23 11:41:20 3.523231 3.315889
15 2017-07-23 11:41:30 2.931527 2.740840
16 2017-07-23 11:41:40 3.078464 2.938004
问题1:如果它落在时间列的两个特定时间范围之间,我想将C1和c2中的值设为无。
对于问题1:我尝试做的是,获取落在两个特定时间范围之间的所有行的索引,然后更改值:
index_list = df.ds[(df.ds >= start_time) & (df.ds <= end_time)].index.tolist()
问题1我解决了使用:
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
df.loc[(df['ds'] >= start_time) & (df['ds'] <= end_time), df.columns!= 'ds'] = None
请帮我解决问题2
现在,如何使用此索引列表在除时间列之外的所有列中将值更改为“无”。
问题2:此外,如果3.38或任何特定数字出现在任何列(时间列除外)中,我想将该值设为无。做这些事情的方法是什么?请建议。我很难做到这一点。感谢
答案 0 :(得分:0)
您可以将boolean indexing
与loc
一起用于NaN
和replace
的更改值,但实际上它有问题,请同时查看this:
#convert if not datetime
#df['time'] = pd.to_datetime(df['time'])
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
#select only column c1, c2
df.loc[(df.time >= start_time) & (df.time <= end_time), ['c1','c2']] = np.nan
#check floats
print (df.iloc[0].tolist())
[Timestamp('2017-07-23 11:39:10'), 3.3856610000000003, 3.1933020000000001]
df = df.replace(3.3856610000000003,np.nan)
print (df)
time c1 c2
1 2017-07-23 11:39:10 NaN 3.193302
2 2017-07-23 11:39:20 3.157000 2.912690
3 2017-07-23 11:39:30 3.277145 3.124290
4 2017-07-23 11:39:40 3.126075 2.982679
5 2017-07-23 11:39:50 3.135766 2.985840
6 2017-07-23 11:40:00 3.166134 3.016147
7 2017-07-23 11:40:10 2.487507 2.256214
8 2017-07-23 11:40:20 NaN NaN
9 2017-07-23 11:40:30 NaN NaN
10 2017-07-23 11:40:40 NaN NaN
11 2017-07-23 11:40:50 NaN NaN
12 2017-07-23 11:41:00 3.338989 3.192018
13 2017-07-23 11:41:10 2.674149 2.496557
14 2017-07-23 11:41:20 3.523231 3.315889
15 2017-07-23 11:41:30 2.931527 2.740840
16 2017-07-23 11:41:40 3.078464 2.938004
difference
针对没有time
和np.isclose
功能的所有列的解决方案:
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
cols = df.columns.difference(['time'])
df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), cols] = None
df[cols] = df[cols].mask(np.isclose(df[cols].values, 3.38566), None)
print (df)
time c1 c2
1 2017-07-23 11:39:10 None 3.1933
2 2017-07-23 11:39:20 3.157 2.91269
3 2017-07-23 11:39:30 3.27714 3.12429
4 2017-07-23 11:39:40 3.12608 2.98268
5 2017-07-23 11:39:50 3.13577 2.98584
6 2017-07-23 11:40:00 3.16613 3.01615
7 2017-07-23 11:40:10 2.48751 2.25621
8 2017-07-23 11:40:20 NaN NaN
9 2017-07-23 11:40:30 NaN NaN
10 2017-07-23 11:40:40 NaN NaN
11 2017-07-23 11:40:50 NaN NaN
12 2017-07-23 11:41:00 3.33899 3.19202
13 2017-07-23 11:41:10 2.67415 2.49656
14 2017-07-23 11:41:20 3.52323 3.31589
15 2017-07-23 11:41:30 2.93153 2.74084
16 2017-07-23 11:41:40 3.07846 2.938
您可以DatetimeIndex
使用set_index
,然后按loc
选择行并设置NaN
。
替换float
值有点问题,因为精度。因此,请numpy.isclose
与mask
联系,以None
替换boolean mask
:
#if necessary convert to datetime
#df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df.loc['2017-07-23 11:39:20':'2017-07-23 11:39:50'] = np.nan
df.loc['2017-07-23 11:40:20':'2017-07-23 11:40:50'] = np.nan
df = df.mask(np.isclose(df.values, 3.38566))
print (df)
c1 c2
time
2017-07-23 11:39:10 NaN 3.193302
2017-07-23 11:39:20 NaN NaN
2017-07-23 11:39:30 NaN NaN
2017-07-23 11:39:40 NaN NaN
2017-07-23 11:39:50 NaN NaN
2017-07-23 11:40:00 3.166134 3.016147
2017-07-23 11:40:10 2.487507 2.256214
2017-07-23 11:40:20 NaN NaN
2017-07-23 11:40:30 NaN NaN
2017-07-23 11:40:40 NaN NaN
2017-07-23 11:40:50 NaN NaN
2017-07-23 11:41:00 3.338989 3.192018
2017-07-23 11:41:10 2.674149 2.496557
2017-07-23 11:41:20 3.523231 3.315889
2017-07-23 11:41:30 2.931527 2.740840
2017-07-23 11:41:40 3.078464 2.938004
答案 1 :(得分:0)
问题1我解决了使用:
start_time = '2017-07-23 11:40:20'
end_time = '2017-07-23 11:40:50'
df.loc[(df['time'] >= start_time) & (df['time'] <= end_time), df.columns!= 'time'] = None