Question

我有一系列与此类似的数据框：

df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))

为了清理它们，我编写了一个用户定义的函数，其中包含一组清理步骤：

clean <- function(df){
  colnames(df) <- df[2,]
  df <- df[grep('^[0-9]{4}', df$year),]
  return(df)
}

我现在想把我的数据框放在一个列表中：

df_list <- list(df,df2)

立刻清理它们。我试过了

lapply(df_list, clean)

和

for(df in df_list){
  clean(df)
}

但是使用这两种方法我都会收到错误：

Error in df[2, ] : incorrect number of dimensions

导致此错误的原因是什么？如何解决？我对这个问题的处理方法是错误的吗？

Answer 1

你很近，但代码中有一个问题。由于数据框的列中包含文本，因此列将创建为因子而非字符。因此，您的列命名不会提供预期的结果。

#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)  
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)

clean <- function(df){
  colnames(df) <- df[2,]
  #need to specify the column to select the rows
  df <- df[grep('^[0-9]{4}', df$year),]

  #convert the columns to numeric values
    df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)

  return(df)
}

df_list <- list(df,df2)
lapply(df_list, clean)

将用户定义的函数应用于数据框列表

1 个答案: