R:找到一系列值并将其替换为cvs文件中的NA

时间:2015-11-05 16:48:56

标签: r

从现场测量中收集的数据有点混乱,在进一步计算之前需要进行一些清理。

请你在R中给我一个示例来查找一系列值并将它们替换为cvs文件中的NA?

合理的值应介于1800和2200之间或NA。超出此范围的任何值都应替换为NA。

数据集如下所示:

          timestamp        tr        ts
 1   2015-07-08 02:29:00   -40.5     1978.62
 2   2015-07-08 02:30:00   1936.74   30.5
 3   2015-07-08 02:31:00   1937.14   1978.99
 4   2015-07-08 02:32:00   1937.66   1978.83
 5   2015-07-08 02:33:00   402.4     1979.15
 6   2015-07-08 02:45:00   1937.00   1979.00
 7   2015-07-08 02:46:00   1937.75   1979.29
 8   2015-07-08 02:47:00   1937.84   1978.44
 9   2015-07-08 02:48:00   -30.23    3.5
 10  2015-07-08 02:49:00   1937.82   1978.68
 11  2015-07-08 02:50:00   1937.55   1979.60
 12  2015-07-08 02:51:00   1937.55   1979.13
 13  2015-07-08 02:52:00   1937.65   1979.12
 14  2015-07-08 02:53:00   1937.56   1978.28
 15  2015-07-08 02:54:00   1937.38   1978.99
 16  2015-07-08 02:58:00   -22.34    1978.61
 17  2015-07-08 02:59:00   1937.78   1978.85
 18  2015-07-08 03:00:00   1937.71   100.42
 19  2015-07-08 03:01:00   1937.14   1979.04
 20  2015-07-08 03:02:00   2500.00   0.13

筛选和更换后的数据集。

          timestamp        tr        ts
 1   2015-07-08 02:29:00   NA        1978.62
 2   2015-07-08 02:30:00   1936.74   NA
 3   2015-07-08 02:31:00   1937.14   1978.99
 4   2015-07-08 02:32:00   1937.66   1978.83
 5   2015-07-08 02:33:00   NA        1979.15
 6   2015-07-08 02:45:00   1937.00   1979.00
 7   2015-07-08 02:46:00   1937.75   1979.29
 8   2015-07-08 02:47:00   1937.84   1978.44
 9   2015-07-08 02:48:00   NA        NA
 10  2015-07-08 02:49:00   1937.82   1978.68
 11  2015-07-08 02:50:00   1937.55   1979.60
 12  2015-07-08 02:51:00   1937.55   1979.13
 13  2015-07-08 02:52:00   1937.65   1979.12
 14  2015-07-08 02:53:00   1937.56   1978.28
 15  2015-07-08 02:54:00   1937.38   1978.99
 16  2015-07-08 02:58:00   NA        1978.61
 17  2015-07-08 02:59:00   1937.78   1978.85
 18  2015-07-08 03:00:00   1937.71   NA
 19  2015-07-08 03:01:00   1937.14   1979.04
 20  2015-07-08 03:02:00   NA        NA

非常感谢你们。

2 个答案:

答案 0 :(得分:1)

# simulate some data
set.seed(123)
ts=rnorm(15,2000,300)
      ts
1  1831.857
2  1930.947
3  2467.612
4  2021.153
5  2038.786
6  2514.519
7  2138.275
8  1620.482
9  1793.944
10 1866.301
11 2367.225
12 2107.944
13 2120.231
14 2033.205
15 1833.248

# then convert all numbers less than 1800 or greater than 2200 to NA's
ts[ts <= 1800 | ts >= 2200] = NA
as.data.frame(list(ts=ts))
      ts
1  1831.857
2  1930.947
3        NA
4  2021.153
5  2038.786
6        NA
7  2138.275
8        NA
9        NA
10 1866.301
11       NA
12 2107.944
13 2120.231
14 2033.205
15 1833.248

或者在您的情况下,如果您的数据框称为数据

data$ts[data$ts <= 1800 | data$ts >= 2200] = NA

答案 1 :(得分:1)

这与@Ranalyst的答案类似,但我使用ifelse方法结合sapply来更新多个列。

dt = read.table(text = "timestamp  tr  ts
2015-07-08   -40.5     1978.62
2015-07-08   1936.74   30.5
2015-07-08   1937.14   1978.99
2015-07-08   1937.66   1978.83
2015-07-08   402.4     1979.15
2015-07-08   1937.00   1979.00", header=T)

dt

#    timestamp      tr      ts
# 1 2015-07-08  -40.50 1978.62
# 2 2015-07-08 1936.74   30.50
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08  402.40 1979.15
# 6 2015-07-08 1937.00 1979.00


# select positions of columns to update
cols_to_update = 2:3

# update those columns
dt[,cols_to_update] = sapply(cols_to_update, function(x) ifelse(dt[,x] <= 1800 | dt[,x] >= 2200, NA, dt[,x]))

dt

#    timestamp      tr      ts
# 1 2015-07-08      NA 1978.62
# 2 2015-07-08 1936.74      NA
# 3 2015-07-08 1937.14 1978.99
# 4 2015-07-08 1937.66 1978.83
# 5 2015-07-08      NA 1979.15
# 6 2015-07-08 1937.00 1979.00