我有一个变量,列出了每个公司活跃的状态。我希望能够创建某种区域(西,南,中西部等)变量。在这种情况下,每个公司可能位于多个区域。
到目前为止,我的方法是定义区域向量,然后尝试将指标变量(dat $ westYN,dat $ southYN等)定义为1(如果该公司在该区域中的状态下运行,否则为0)。
我使用str_split来分解状态字符串,但我正在努力解决如何使用结果列表。
在这种情况下,当dat $ state是单个值时它正在工作,但是当有多个状态时它不工作。
非常感谢任何帮助!
id <- 1:20
state <- c("NJ, NY",
"ID, OR",
"CA",
"FL, MO, NC, RI",
"TX DE, KY, MD, MA, NH, NJ, PA, RI, WV",
"FL, KY, TN",
"DC, MD, VA",
"NY",
"AZ, NM",
"FL, NJ, NY",
"IN, MI",
"GA, SC",
"NV",
"AR, CO, KY, MO, TN, TX",
"OH",
"NC",
"FL",
"IL",
"AZ",
"CA, CT, IL, MA, OH, PA, UT, WV"
)
dat <- data.frame(id, state)
west <- c("WA", "OR", "CA", "NV", "AZ", "ID", "MT", "WY",
"CO", "NM", "UT")
south <- c("TX", "OK", "AR", "LA", "MS", "AL", "TN", "KY",
"GA", "FL", "SC", "NC", "VA", "WV")
midwest <- c("KS", "NE", "SD", "ND", "MN", "MO", "IA", "IL",
"IN", "MI", "WI", "OH")
northeast <- c("ME", "NH", "NY", "MA", "RI", "VT", "PA",
"NJ", "CT", "DE", "MD", "DC")
stateList <- stringr::str_split(dat$state, ",")
dat$westYN <- ifelse(is.element(stateList, west), 1, 0)
dat$southYN <- ifelse(is.element(stateList, south), 1, 0)
dat$midwestYN <- ifelse(is.element(stateList, midwest), 1, 0)
dat$northeastYN <- ifelse(is.element(stateList, northeast), 1, 0)
答案 0 :(得分:2)
首先,我认为最好将相关信息存储在列表中而不是单独的变量
regions <- list(
west = c("WA", "OR", "CA", "NV", "AZ", "ID", "MT", "WY",
"CO", "NM", "UT"),
south = c("TX", "OK", "AR", "LA", "MS", "AL", "TN", "KY",
"GA", "FL", "SC", "NC", "VA", "WV"),
midwest = c("KS", "NE", "SD", "ND", "MN", "MO", "IA", "IL",
"IN", "MI", "WI", "OH"),
northeast = c("ME", "NH", "NY", "MA", "RI", "VT", "PA",
"NJ", "CT", "DE", "MD", "DC")
)
然后你可以更容易地做一个循环来创建你的变量
for(region in names(regions)) {
dat[[paste0(region, "YN")]] <-sapply(stateList, function(x) any(trimws(x) %in% regions[[region]]))
}
或使用常规可用物品
dat$westYN <- sapply(stateList, function(x) any(x %in% west))
dat$southYN <- sapply(stateList, function(x) any(x %in% south))
dat$midwestYN <- sapply(stateList, function(x) any(x %in% midwest))
dat$northeastYN <- sapply(stateList, function(x) any(x %in% northeast))
诀窍是使用any()
查看是否有任何值与每个区域中的某个状态匹配。