我有一个数据表对象
> Hydro_Sen
Index Date Obs_m3_s T_str P_factor Flow_m3_s Gauge Month Year T Normalised
1: 0 1/04/2000 13.37000 T_-1 0.95 28.987400 Aconcagua 4 2000 -1 0.08409943
2: 1 1/05/2000 9.94387 T_-1 0.95 15.542100 Aconcagua 5 2000 -1 -0.59053122
3: 2 1/06/2000 13.80530 T_-1 0.95 19.139900 Aconcagua 6 2000 -1 -0.41000821
---
165238: 165237 1/01/2018 NA T_4 1.40 0.593462 Juncal2 1 2018 4 -1.34059328
165239: 165238 1/02/2018 NA T_4 1.40 0.403063 Juncal2 2 2018 4 -1.35014673
165240: 165239 1/03/2018 NA T_4 1.40 0.252990 Juncal2 3 2018 4 -1.35767678
> str(Hydro_Sen)
Classes ‘data.table’ and 'data.frame': 165240 obs. of 11 variables:
$ Index : int 0 1 2 3 4 5 6 7 8 9 ...
$ Date : chr "1/04/2000" "1/05/2000" "1/06/2000" "1/07/2000" ...
$ Obs_m3_s : num 13.37 9.94 13.81 23.18 21.87 ...
$ T_str : chr "T_-1" "T_-1" "T_-1" "T_-1" ...
$ P_factor : num 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
$ Flow_m3_s : num 29 15.5 19.1 20.8 18.5 ...
$ Gauge : chr "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
$ Month : int 4 5 6 7 8 9 10 11 12 1 ...
$ Year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 ...
$ T : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Normalised: num 0.0841 -0.5905 -0.41 -0.3283 -0.4442 ...
- attr(*, ".internal.selfref")=<externalptr>
并且在使用另一列(Obs_m3_s)中的mean和std值规范化现有列(FLow_m3_s)之后尝试创建新列(Normalised2)。平均值和标准值分别计算,作为另一列(标尺)的值的函数。
我尝试过以下
for(i in unique(Hydro_Sen$Gauge)){
Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-mean(Temp, na.rm=T))/sd(Temp, na.rm=T)]
}
但是我收到以下错误
Warning messages:
1: In if (na.rm) x <- x[!is.na(x)] :
the condition has length > 1 and only the first element will be used
我查看了其他帖子(the condition has length > 1 and only the first element will be used in if else statement和The condition has length > 1 and only the first element will be used等)并发现当R被迫评估向量上的if条件时,通常会出现问题。但是,我不确定这是如何适用于我的情况。我检查了平均值和sd的计算是否正常,
> mean(Temp, na.rm=T)
[1] 7.093052
> length(mean(Temp, na.rm=T))
[1] 1
> str(mean(Temp, na.rm=T))
num 7.09
如果我删除均值和sd计算中的(na.rm=T)
,则会消除警告,但在这些情况下,我会得到NA作为答案。我通过以下方式找到了解决方案
for(i in unique(Hydro_Sen$Gauge)){
Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
Temp2=mean(Temp, na.rm=T)
Temp3=sd(Temp, na.rm=T)
Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-Temp2)/Temp3]
}
但我想了解为什么第一个解决方案会生成警告消息?有关如何处理这个问题的任何想法吗?