如何查找数据中的所有数字列

时间:2014-12-15 00:36:00

标签: r

我试图找到其中只有numeric个数据的列的所有名称。为此,我使用is.numeric并将其应用于我的数据:

> sapply(ds[vars], is.numeric)
      MinTemp       MaxTemp      Rainfall   Evaporation      Sunshine   WindGustDir WindGustSpeed    WindDir9am    WindDir3pm  WindSpeed9am 
         TRUE          TRUE          TRUE          TRUE          TRUE         FALSE          TRUE         FALSE         FALSE          TRUE 
 WindSpeed3pm   Humidity9am   Humidity3pm   Pressure9am   Pressure3pm      Cloud9am      Cloud3pm       Temp9am       Temp3pm     RainToday 
         TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE         FALSE 
 RainTomorrow 
        FALSE 

根据我的数据,上述内容是有道理的。例如,WindGustDir列和WindDir9am列的值为NW,因此它们为FALSE

当我将这个应用于我的数据以获取所有数字列的名称时,我不希望看到非数字的列 - 例如WindGustDirWindDir9am。但是,我看到它WindDir9am而不是WindGustDir问题我不明白为什么会这样。我该如何修复它以便我只获得数字列?

> numerics <- names(ds)[which(sapply(ds[vars], is.numeric))]
> numerics
 [1] "Date"         "Location"     "MinTemp"      "MaxTemp"      "Rainfall"     "Sunshine"     "WindDir9am"   "WindDir3pm"   "WindSpeed9am"
[10] "WindSpeed3pm" "Humidity9am"  "Humidity3pm"  "Pressure9am"  "Pressure3pm"  "Cloud9am"     "Cloud3pm"  

以下是我使用的数据的链接:http://rattle.togaware.com/weather.csv

修改

> vars
 [1] "MinTemp"       "MaxTemp"       "Rainfall"      "Evaporation"   "Sunshine"     
 [6] "WindGustDir"   "WindGustSpeed" "WindDir9am"    "WindDir3pm"    "WindSpeed9am" 
[11] "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"   "Pressure9am"   "Pressure3pm"  
[16] "Cloud9am"      "Cloud3pm"      "Temp9am"       "Temp3pm"       "RainToday"    
[21] "RainTomorrow"

2 个答案:

答案 0 :(得分:3)

当你这样做时:

which(sapply(ds[vars], is.numeric))

您获得ds[vars](不是ds)的数字列的索引。因此,如果您想要取回姓名,请务必将其应用于names(ds[vars]),而不是names(ds),其中包含不同的列。

names(ds[vars])[which(sapply(ds[vars], is.numeric))]

您也可以这样做:

vars[which(sapply(ds[vars], is.numeric))]

甚至使用理查德建议的逻辑索引:

vars[sapply(ds[vars], is.numeric)]

最后,我会考虑var是否有用,看看是否直接在df上完成工作:

names(df)[sapply(ds, is.numeric)]

获得你想要的东西。

答案 1 :(得分:1)

which(sapply(ds[vars], is.numeric))应该提供一个索引向量,指示包含数字数据的列。假设ds是data.frame或matrix对象,则可以使用此向量对原始数据进行子集化:

ids <- which(sapply(ds, is.numeric))
foo <- ds[, ids]

编辑:第二个想法,根本不需要which()。只需对sapply()的结果进行分组:

names(ds[, sapply(ds, is.numeric)])
#[1] "MinTemp"       "MaxTemp"       "Rainfall"      "Evaporation"   "Sunshine"     
#[6] "WindGustSpeed" "WindSpeed9am"  "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"  
#[11] "Pressure9am"   "Pressure3pm"   "Cloud9am"      "Cloud3pm"      "Temp9am"      
#[16] "Temp3pm"       "RISK_MM"