如何从R中的状态列表创建区域指示符变量

时间:2017-12-07 15:30:22

标签: r list vector

我有一个变量,列出了每个公司活跃的状态。我希望能够创建某种区域(西,南,中西部等)变量。在这种情况下,每个公司可能位于多个区域。

到目前为止,我的方法是定义区域向量,然后尝试将指标变量(dat $ westYN,dat $ southYN等)定义为1(如果该公司在该区域中的状态下运行,否则为0)。

我使用str_split来分解状态字符串,但我正在努力解决如何使用结果列表。

在这种情况下,当dat $ state是单个值时它正在工作,但是当有多个状态时它不工作。

非常感谢任何帮助!

id <- 1:20
state <- c("NJ, NY", 
       "ID, OR", 
       "CA", 
       "FL, MO, NC, RI", 
       "TX DE, KY, MD, MA, NH, NJ, PA, RI, WV",
       "FL, KY, TN", 
       "DC, MD, VA", 
       "NY",
       "AZ, NM",
       "FL, NJ, NY",
       "IN, MI",
       "GA, SC", 
       "NV", 
       "AR, CO, KY, MO, TN, TX",
       "OH", 
       "NC", 
       "FL", 
       "IL", 
       "AZ", 
       "CA, CT, IL, MA, OH, PA, UT, WV"
       )

dat <- data.frame(id, state)

west <- c("WA", "OR", "CA", "NV", "AZ", "ID", "MT", "WY",
      "CO", "NM", "UT")
south <- c("TX", "OK", "AR", "LA", "MS", "AL", "TN", "KY",
       "GA", "FL", "SC", "NC", "VA", "WV")
midwest <- c("KS", "NE", "SD", "ND", "MN", "MO", "IA", "IL",
         "IN", "MI", "WI", "OH")
northeast <- c("ME", "NH", "NY", "MA", "RI", "VT", "PA", 
           "NJ", "CT", "DE", "MD", "DC")

stateList <- stringr::str_split(dat$state, ",")

dat$westYN <- ifelse(is.element(stateList, west), 1, 0)
dat$southYN <- ifelse(is.element(stateList, south), 1, 0)
dat$midwestYN <- ifelse(is.element(stateList, midwest), 1, 0)
dat$northeastYN <- ifelse(is.element(stateList, northeast), 1, 0)

1 个答案:

答案 0 :(得分:2)

首先,我认为最好将相关信息存储在列表中而不是单独的变量

regions <- list(
  west = c("WA", "OR", "CA", "NV", "AZ", "ID", "MT", "WY",
          "CO", "NM", "UT"),
  south = c("TX", "OK", "AR", "LA", "MS", "AL", "TN", "KY",
           "GA", "FL", "SC", "NC", "VA", "WV"),
  midwest = c("KS", "NE", "SD", "ND", "MN", "MO", "IA", "IL",
             "IN", "MI", "WI", "OH"),
  northeast = c("ME", "NH", "NY", "MA", "RI", "VT", "PA", 
              "NJ", "CT", "DE", "MD", "DC")
)

然后你可以更容易地做一个循环来创建你的变量

for(region in names(regions)) {
  dat[[paste0(region, "YN")]] <-sapply(stateList, function(x) any(trimws(x) %in% regions[[region]]))
}

或使用常规可用物品

dat$westYN <- sapply(stateList, function(x) any(x %in% west))
dat$southYN <- sapply(stateList, function(x) any(x %in% south))
dat$midwestYN <- sapply(stateList, function(x) any(x %in% midwest))
dat$northeastYN <- sapply(stateList, function(x) any(x %in% northeast))

诀窍是使用any()查看是否有任何值与每个区域中的某个状态匹配。