在R中的数据框架中创建新列时出错

时间:2015-11-30 07:04:17

标签: r dataframe apply

我有一个如下所示的df:

     id type start end features
1     5 word     1   2       NN
2     6 word     3   3        .
3     7 word     5  12       NN
4     8 word    14  19      VBZ
5     9 word    21  30       NN
6    10 word    32  32      WDT
7    11 word    34  37      VBP
8    12 word    39  41       IN
9    13 word    43  44       IN
10   14 word    46  46       DT

我想创建一个新列" sum"在'开始'中的每个值的总和并且'结束'。

我创建了以下功能:

    mySum <- function(row) {
      row["start"]+row["end"]
    }
    df$sum <- apply(df,1, mySum );

但是当我运行这个时,我得到以下错误:

Error in row["start"] + row["end"] : 
  non-numeric argument to binary operator

但是如果我在函数中只保留[&#34; start&#34;]或row [&#34; end&#34;]行,它就会被创建。

我还尝试强制列中的每个值都是数字。

df$start = as.integer(as.vector(df$start));
df$end = as.integer(as.vector(df$end)); 

但是,只有当我添加值时,我才会得到相同的错误。

我的数据框架结构如下: 在我运行dput(droplevels(head(df,10)))

之后
structure(list(id = 5:14, type = c("word", "word", "word", "word", 
"word", "word", "word", "word", "word", "word"), start = c(1L, 
3L, 5L, 14L, 21L, 32L, 34L, 39L, 43L, 46L), end = c(2L, 3L, 12L, 
19L, 30L, 32L, 37L, 41L, 44L, 46L), features = list(structure(list(
    POS = "NN"), .Names = "POS"), structure(list(POS = "."), .Names = "POS"), 
    structure(list(POS = "NN"), .Names = "POS"), structure(list(
        POS = "VBZ"), .Names = "POS"), structure(list(POS = "NN"), .Names = "POS"), 
    structure(list(POS = "WDT"), .Names = "POS"), structure(list(
        POS = "VBP"), .Names = "POS"), structure(list(POS = "IN"), .Names = "POS"), 
    structure(list(POS = "IN"), .Names = "POS"), structure(list(
        POS = "DT"), .Names = "POS"))), .Names = c("id", "type", 
"start", "end", "features"), row.names = c(NA, 10L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

只做

df1$Sum <- df1[,'start']+ df1[,'end']
df1$Sum
#[1]  3  6 17 33 51 64 71 80 87 92

或者

rowSums(df1[c('start', 'end')], na.rm=TRUE)
#1  2  3  4  5  6  7  8  9 10 
#3  6 17 33 51 64 71 80 87 92 

error表示您有非数字列。检查str(df1)。如果课程为factorcharacter,请将其更改为numeric并应用上述代码。例如,如果列为factor,我们将转换为numeric

 df1[c('start', 'end')] <- lapply(df1[c('start', 'end')],
               function(x) as.numeric(as.character(x)))

如果是character列,只需使用as.numeric