从现场测量中收集的数据有点混乱,在进一步计算之前需要进行一些清理。
请你在R中给我一个示例来查找一系列值并将它们替换为cvs文件中的NA?
合理的值应介于1800和2200之间或NA。超出此范围的任何值都应替换为NA。
数据集如下所示:
timestamp tr ts
1 2015-07-08 02:29:00 -40.5 1978.62
2 2015-07-08 02:30:00 1936.74 30.5
3 2015-07-08 02:31:00 1937.14 1978.99
4 2015-07-08 02:32:00 1937.66 1978.83
5 2015-07-08 02:33:00 402.4 1979.15
6 2015-07-08 02:45:00 1937.00 1979.00
7 2015-07-08 02:46:00 1937.75 1979.29
8 2015-07-08 02:47:00 1937.84 1978.44
9 2015-07-08 02:48:00 -30.23 3.5
10 2015-07-08 02:49:00 1937.82 1978.68
11 2015-07-08 02:50:00 1937.55 1979.60
12 2015-07-08 02:51:00 1937.55 1979.13
13 2015-07-08 02:52:00 1937.65 1979.12
14 2015-07-08 02:53:00 1937.56 1978.28
15 2015-07-08 02:54:00 1937.38 1978.99
16 2015-07-08 02:58:00 -22.34 1978.61
17 2015-07-08 02:59:00 1937.78 1978.85
18 2015-07-08 03:00:00 1937.71 100.42
19 2015-07-08 03:01:00 1937.14 1979.04
20 2015-07-08 03:02:00 2500.00 0.13
筛选和更换后的数据集。
timestamp tr ts
1 2015-07-08 02:29:00 NA 1978.62
2 2015-07-08 02:30:00 1936.74 NA
3 2015-07-08 02:31:00 1937.14 1978.99
4 2015-07-08 02:32:00 1937.66 1978.83
5 2015-07-08 02:33:00 NA 1979.15
6 2015-07-08 02:45:00 1937.00 1979.00
7 2015-07-08 02:46:00 1937.75 1979.29
8 2015-07-08 02:47:00 1937.84 1978.44
9 2015-07-08 02:48:00 NA NA
10 2015-07-08 02:49:00 1937.82 1978.68
11 2015-07-08 02:50:00 1937.55 1979.60
12 2015-07-08 02:51:00 1937.55 1979.13
13 2015-07-08 02:52:00 1937.65 1979.12
14 2015-07-08 02:53:00 1937.56 1978.28
15 2015-07-08 02:54:00 1937.38 1978.99
16 2015-07-08 02:58:00 NA 1978.61
17 2015-07-08 02:59:00 1937.78 1978.85
18 2015-07-08 03:00:00 1937.71 NA
19 2015-07-08 03:01:00 1937.14 1979.04
20 2015-07-08 03:02:00 NA NA
非常感谢你们。
答案 0 :(得分:1)
# simulate some data
set.seed(123)
ts=rnorm(15,2000,300)
ts
1 1831.857
2 1930.947
3 2467.612
4 2021.153
5 2038.786
6 2514.519
7 2138.275
8 1620.482
9 1793.944
10 1866.301
11 2367.225
12 2107.944
13 2120.231
14 2033.205
15 1833.248
# then convert all numbers less than 1800 or greater than 2200 to NA's
ts[ts <= 1800 | ts >= 2200] = NA
as.data.frame(list(ts=ts))
ts
1 1831.857
2 1930.947
3 NA
4 2021.153
5 2038.786
6 NA
7 2138.275
8 NA
9 NA
10 1866.301
11 NA
12 2107.944
13 2120.231
14 2033.205
15 1833.248
或者在您的情况下,如果您的数据框称为数据
data$ts[data$ts <= 1800 | data$ts >= 2200] = NA
答案 1 :(得分:1)
这与@Ranalyst的答案类似,但我使用ifelse
方法结合sapply
来更新多个列。
dt = read.table(text = "timestamp tr ts
2015-07-08 -40.5 1978.62
2015-07-08 1936.74 30.5
2015-07-08 1937.14 1978.99
2015-07-08 1937.66 1978.83
2015-07-08 402.4 1979.15
2015-07-08 1937.00 1979.00", header=T)
dt
# timestamp tr ts
# 1 2015-07-08 -40.50 1978.62
# 2 2015-07-08 1936.74 30.50
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08 402.40 1979.15
# 6 2015-07-08 1937.00 1979.00
# select positions of columns to update
cols_to_update = 2:3
# update those columns
dt[,cols_to_update] = sapply(cols_to_update, function(x) ifelse(dt[,x] <= 1800 | dt[,x] >= 2200, NA, dt[,x]))
dt
# timestamp tr ts
# 1 2015-07-08 NA 1978.62
# 2 2015-07-08 1936.74 NA
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08 NA 1979.15
# 6 2015-07-08 1937.00 1979.00