R因子有两种类型的数字

时间:2017-11-12 00:31:18

标签: r numeric levels

我在R中有49个级别的因子,我试图使用as.numeric转换为数字

  • 纬度:因素" 0.80N"," 0.40S",...

我希望将North指定转换为" +"和南方" - "所以数据看起来像

  • 纬度:数字0.80,-0.40,...

我不确定如何超越

Mcity$lat <- as.numeric(Mcity$Latitude)



structure(c(40L, 40L, 40L, 40L), .Label = 
  c("0.80N", "0.80S", "10.45N", "12.05N", "12.05S", "13.66N", "13.66S",
    "15.27N", "15.27S", "16.87N", "18.48N", "18.48S", "2.41N", "20.09N", 
    "20.09S", "21.70N", "23.31N", "23.31S", "24.92N", "26.52N", "28.13N", 
    "29.74N", "29.74S", "31.35N", "32.95N", "32.95S", "34.56N", "34.56S", 
    "36.17N", "37.78N", "37.78S", "39.38N", "4.02N", "4.02S", "40.99N", "42.59N", 
    "44.20N", "45.81N", "49.03N", "5.63N", "5.63S", "50.63N", "52.24N", "55.45N", 
    "60.27N", "7.23N", "7.23S", "8.84N", "8.84S"), class = "factor") 

3 个答案:

答案 0 :(得分:2)

这应该有效:

Mcity$lat <- (1 - 2 * grepl("S", Mcity$Latitude)) * as.numeric(gsub("N|S", "", Mcity$Latitude))

如果找到S则更改数字部分的符号。

答案 1 :(得分:1)

您可以使用stringr来删除最后一个字符,然后使用dplyr作为重新组合的选项,我使用case_when来提供额外的错误处理,但ifelse已经足够了。

library(dplyr)
library(stringr)

fct_list <- factor(
  c(
    "0.80N", "0.80S", "10.45N", "12.05N", "12.05S", "13.66N", "13.66S",
    "15.27N", "15.27S", "16.87N", "18.48N", "18.48S", "2.41N", "20.09N",
    "20.09S", "21.70N", "23.31N", "23.31S", "24.92N", "26.52N", "28.13N",
    "29.74N", "29.74S", "31.35N", "32.95N", "32.95S", "34.56N", "34.56S",
    "36.17N", "37.78N", "37.78S", "39.38N", "4.02N", "4.02S", "40.99N",
    "42.59N", "44.20N", "45.81N", "49.03N", "5.63N", "5.63S", "50.63N",
    "52.24N", "55.45N", "60.27N", "7.23N", "7.23S"
  )
)

# note that factors are often no fun, so I've converted to character here
string <- as.character(fct_list)

case_when(
  str_sub(string, -1, -1) == "N" ~ as.numeric(str_sub(string, 1, nchar(string) - 1)),
  str_sub(string, -1, -1) == "S" ~ -as.numeric(str_sub(string, 1, nchar(string) - 1)),
  TRUE ~ NA_real_
)

#  [1]   0.80  -0.80  10.45  12.05 -12.05  13.66 -13.66  15.27
#  [9] -15.27  16.87  18.48 -18.48   2.41  20.09 -20.09  21.70
# [17]  23.31 -23.31  24.92  26.52  28.13  29.74 -29.74  31.35
# [25]  32.95 -32.95  34.56 -34.56  36.17  37.78 -37.78  39.38
# [33]   4.02  -4.02  40.99  42.59  44.20  45.81  49.03   5.63
# [41]  -5.63  50.63  52.24  55.45  60.27   7.23  -7.23

比来自 BenoitLondon 的正则表达式解决方案更加冗长,但我倾向于在探索性工作中倾向于简洁而不简洁。

答案 2 :(得分:0)

ifelse的另一种选择可能如下:

lat <- c("0.80N", "0.80S", "10.45N", "12.05S", "12.05S")
lat <- as.character(lat)
## use of substr function inside an ifelse function
lat2 <- ifelse(substr(lat,nchar(lat),nchar(lat)) == 'N',
              as.numeric(substr(lat,1,(nchar(lat)-1))),
              -as.numeric(substr(lat,1,(nchar(lat)-1))))