在R数据表中创建列会生成警告"条件的长度为> 1,只使用第一个元素"

时间:2018-05-14 23:56:18

标签: r data.table warnings

我有一个数据表对象

> Hydro_Sen
         Index      Date Obs_m3_s T_str P_factor Flow_m3_s     Gauge Month Year  T  Normalised
     1:      0 1/04/2000 13.37000  T_-1     0.95 28.987400 Aconcagua     4 2000 -1  0.08409943
     2:      1 1/05/2000  9.94387  T_-1     0.95 15.542100 Aconcagua     5 2000 -1 -0.59053122
     3:      2 1/06/2000 13.80530  T_-1     0.95 19.139900 Aconcagua     6 2000 -1 -0.41000821
---
165238: 165237 1/01/2018       NA   T_4     1.40  0.593462   Juncal2     1 2018  4 -1.34059328
165239: 165238 1/02/2018       NA   T_4     1.40  0.403063   Juncal2     2 2018  4 -1.35014673
165240: 165239 1/03/2018       NA   T_4     1.40  0.252990   Juncal2     3 2018  4 -1.35767678

> str(Hydro_Sen)
Classes ‘data.table’ and 'data.frame':  165240 obs. of  11 variables:
 $ Index     : int  0 1 2 3 4 5 6 7 8 9 ...
 $ Date      : chr  "1/04/2000" "1/05/2000" "1/06/2000" "1/07/2000" ...
 $ Obs_m3_s  : num  13.37 9.94 13.81 23.18 21.87 ...
 $ T_str     : chr  "T_-1" "T_-1" "T_-1" "T_-1" ...
 $ P_factor  : num  0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
 $ Flow_m3_s : num  29 15.5 19.1 20.8 18.5 ...
 $ Gauge     : chr  "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
 $ Month     : int  4 5 6 7 8 9 10 11 12 1 ...
 $ Year      : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 ...
 $ T         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Normalised: num  0.0841 -0.5905 -0.41 -0.3283 -0.4442 ...
 - attr(*, ".internal.selfref")=<externalptr> 

并且在使用另一列(Obs_m3_s)中的mean和std值规范化现有列(FLow_m3_s)之后尝试创建新列(Normalised2)。平均值和标准值分别计算,作为另一列(标尺)的值的函数。

我尝试过以下

for(i in unique(Hydro_Sen$Gauge)){
  Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
  Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-mean(Temp, na.rm=T))/sd(Temp, na.rm=T)]
}

但是我收到以下错误

Warning messages:
1: In if (na.rm) x <- x[!is.na(x)] :
  the condition has length > 1 and only the first element will be used

我查看了其他帖子(the condition has length > 1 and only the first element will be used in if else statementThe condition has length > 1 and only the first element will be used等)并发现当R被迫评估向量上的if条件时,通常会出现问题。但是,我不确定这是如何适用于我的情况。我检查了平均值和sd的计算是否正常,

> mean(Temp, na.rm=T)
[1] 7.093052
> length(mean(Temp, na.rm=T))
[1] 1
> str(mean(Temp, na.rm=T))
 num 7.09

如果我删除均值和sd计算中的(na.rm=T),则会消除警告,但在这些情况下,我会得到NA作为答案。我通过以下方式找到了解决方案

for(i in unique(Hydro_Sen$Gauge)){
  Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
  Temp2=mean(Temp, na.rm=T)
  Temp3=sd(Temp, na.rm=T)
  Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-Temp2)/Temp3]
}

但我想了解为什么第一个解决方案会生成警告消息?有关如何处理这个问题的任何想法吗?

0 个答案:

没有答案