从R中的函数创建数据框中的新列

时间:2013-08-09 20:03:52

标签: r function dataframe lapply

我有一组看起来像这样的数据框(它们具有相同的列,而不是相同的行数):

df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))

我想对这些数据帧执行一系列操作(计算df1 $ v,df2 $ v,df3 $ v中的单词)。我找到的一个解决方案是将数据帧放在一个列表中,然后使用lapply在列表中的所有数据帧上应用一个函数:

ls <- list(df1, df2, df3)

func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
}

ls_func1 <- lapply(ls, FUN = func1)

ls_func1
[[1]]
[1] 1 1 1 1 2 1
[[2]]
[1] 1 1 1 1 2
[[3]]
[1] 1 1 1 2 1 2 1

至少这可以获得v中单词数量的计数,然后我可以将其再次组合成数据帧或其他任何内容。

问题是,它似乎不适用于每个功能。例如,对于单个数据帧,这可以正常工作:

 for(i in 1:length(df1$v)){
 string <- strsplit(as.character(df1$v[i]), "")
 counter <- 0
     for(j in 1:length(string[[1]])){
         if(grepl("a|b|c|d|e", string[[1]][j])){
         counter <- counter + 1
         }
     }
 df1$length[i] <- counter
 }

df1
       v          x     length
1     banana 0.05233752      4
2      apple 0.08564292      2
3     orange 0.04679124      2
4      grape 0.06655950      2
5 kiwi fruit 0.05684803      0
6       pear 0.07654617      2

但是当将它转换为函数时,它不起作用:

func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
    for(j in 1:length(string[[1]])){
        if(grepl("a|b|c|d|e", string[[1]][j])){
        counter <- counter + 1
        }
    }
dat$length[i] <- counter
}
}

ls_func2 <- lapply(ls, FUN = func2)

ls_func2
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL

我在这里做错了什么?有没有办法在我现有的数据框架中使用这些函数和lapply创建新列?换句话说,通过首先应用第一个函数,然后应用第二个函数来创建以下内容:

ls
[[1]]
           v          x complex length
1     banana 0.05233752       1      4
2      apple 0.08564292       1      2
3     orange 0.04679124       1      2
4      grape 0.06655950       1      2
5 kiwi fruit 0.05684803       2      0
6       pear 0.07654617       1      2

[[2]]
           v          x complex length
1      table 0.65790811       1      2
....
[[3]]
....

等?

2 个答案:

答案 0 :(得分:1)

我添加了show (dat)

 func2 <- function(dat){
       for(i in 1:length(dat$v)){
             string <- strsplit(as.character(dat$v[i]), "")
             counter <- 0
             for(j in 1:length(string[[1]])){
                 if(grepl("a|b|c|d|e", string[[1]][j])){
                     counter <- counter + 1
                 }
             }
             dat$length[i] <- counter

         }
show(dat)
     }


    > ls_func2 <- lapply(ls, FUN = func2)
           v          x length
1     banana 0.05708859      4
2      apple 0.06938091      2
3     orange 0.04796599      2
4      grape 0.05912616      2
5 kiwi fruit 0.06250885      0
6       pear 0.05291484      2
            v          x length
1       table 0.06554054      3
2       chair 0.07783138      2
3       couch 0.06127833      2
4     dresser 0.05443105      3
5 night stand 0.06257048      2
           v          x length
1      white 0.06287645      1
2       blue 0.07196960      2
3       pink 0.05659455      0
4 bright red 0.05996639      3
5     orange 0.05826371      2
6 dark green 0.04892694      4
7      black 0.06830055      3

答案 1 :(得分:1)

这就是你要追求的吗?在每个函数的结束括号之前添加return(dat)

df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))
ls <- list(df1, df2, df3)


func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
return(dat)
}

ls_func1 <- lapply(ls, FUN = func1)
ls_func1



func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
    for(j in 1:length(string[[1]])){
        if(grepl("a|b|c|d|e", string[[1]][j])){
        counter <- counter + 1
        }
    }
dat$length[i] <- counter
}
return(dat)
}

ls_func2 <- lapply(ls_func1, FUN = func2)
ls_func2