根据另外两列的值在列表中生成新变量

时间:2018-04-30 00:35:45

标签: r list variables

我有一个数据框列表我想创建一个新变量" County",基于" State"的值。和"邮政编码"列。我知道这是lapply(df, transform)是必要的情况。

State   Zip
OH  44141
OH  44056
OH  44131
NY  13035
NY  13035
NY  13056

这适用于数据框,因此我不确定这如何转换为列表的应用程序

df$County[df$State == "OH" & df$Zip >= 44056 & df$Zip <= 44356]<- "Summit"
df$County[df$State == "NY" & df$Zip >= 1300 & df$Zip <= 13035]<- "Madison"
df$County[df$State == "NY" & df$Zip < 1300 | df$Zip > 13036] <- "Miscoded"

3 个答案:

答案 0 :(得分:2)

假设您有以下列表。

($dayOf - $dow) %7

使用df_list <- structure(list(NY = structure(list(State = structure(c(1L, 1L, 1L), .Label = c("NY", "OH"), class = "factor"), Zip = c(13035L, 13035L, 13056L)), .Names = c("State", "Zip"), row.names = 4:6, class = "data.frame"), OH = structure(list(State = structure(c(2L, 2L, 2L), .Label = c("NY", "OH"), class = "factor"), Zip = c(44141L, 44056L, 44131L)), .Names = c("State", "Zip"), row.names = c(NA, 3L), class = "data.frame")), .Names = c("NY", "OH")) dplyr::mutate,您可以执行类似

的操作
purrr::map

答案 1 :(得分:1)

您似乎有一个简单的data.frame,因此您可以使用data.frame直接操作transform;这里不需要lapply

为了代码可读性,我建议使用tidyverse

case_when解决方案
library(tidyverse)
df %>%
    mutate(County = case_when(
        State == "OH" & (Zip >= 44056 & Zip <= 44356) ~ "Summit",
        State == "NY" & (Zip >= 1300 & Zip <= 13035) ~ "Madison",
        State == "NY" & (Zip < 1300 | Zip > 13036) ~ "Micoded",
        TRUE ~ "Undefined"))
#  State   Zip  County
#1    OH 44141  Summit
#2    OH 44056  Summit
#3    OH 44131  Summit
#4    NY 13035 Madison
#5    NY 13035 Madison
#6    NY 13056 Micoded

在基地R你可以做到

transform(df, County = ifelse(...))

有嵌套的ifelse条件,这不是很整洁(在我看来)。

请注意,代码中的"Micoded"条件不正确;你需要一个逻辑OR:Zip < 1300 | Zip > 13036

样本数据

df <- read.table(text =
    "State   Zip
OH  44141
OH  44056
OH  44131
NY  13035
NY  13035
NY  13056")

答案 2 :(得分:0)

您可以使用基数R:

查看您的数据,例如,您似乎无法为纽约州提供44056。采取这种假设,你可以做到:

> a=c(1299,13036,44055,44357)
> b=c("Miscoded","Madison","Miscoded","Summit")
> transform(df,county=b[findInterval(Zip,a)+1])


 State   Zip   county
1    OH 44141   Summit
2    OH 44056   Summit
3    OH 44131   Summit
4    NY 13035  Madison
5    NY 13035  Madison
6    NY 13056 Miscoded

如果不考虑这个假设,那么你可以这样做:

df1
State   Zip
1    OH 44141
2    OH 44056
3    OH 44131
4    NY 13035
5    NY 13035
6    NY 13056
7    NY 44141

df1$county<-b[findInterval(df1$Zip,a)+1]
transform(df1,
    county=ifelse(paste(State,county)%in%c("OH Summit","NY Madison"),county,"Miscoded"))

State   Zip   county
1    OH 44141   Summit
2    OH 44056   Summit
3    OH 44131   Summit
4    NY 13035  Madison
5    NY 13035  Madison
6    NY 13056 Miscoded
7    NY 44141 Miscoded

如果您有数据框列表,请执行以下操作:

m=function(df1){
df1$county<-b[findInterval(df1$Zip,a)+1]
transform(df1,
    county=ifelse(paste(State,county)%in%c("OH Summit","NY Madison")
           ,county,"Miscoded"))
}

lapply(df_list,m)