Question

我有25个数据集，每个数据集的结构相同。每个包含许多行和7列。第6列包含应为数字但非数字的数据。它们不是数字，因为数字包含逗号，即100000为100,000。

我可以通过以下方式手动解决每个数据集中的问题：删除逗号，然后使用以下代码指定数据为数字

df$column_6 <- gsub("[,]" , "", df$column_6)
df$column_6 <- as.numerical(df$column_6)

尽管有25个数据集，但我想遍历它们，但我无法做到这一点。

另外，因为第6列在每个数据集中都有不同的名称，所以我希望在不使用第6列的名称的情况下指定第6列

df[6] <- gsub("[,]" , "", df[6])

但是这似乎不起作用。

我的代码如下

list_of_dfs = c(df1, df2, ..... , df25)

for (i in list_of_dfs) {
  i[6] <- gsub("[,]" , "", i[6])
  i[6] <- as.numerical(i[6])
}

有人对此有任何建议吗

Answer 1

尝试一下。您将所有数据框都放在一个列表中，然后使该列成为数字。我使用gsub而不是readr::parse_number。我还将提供一个练习集供说明。

library(tidyverse)

df1 <- data_frame(id = rep(1,3), num = c("10,000", "11,000", "12,000"))
df2 <- data_frame(id = rep(2,3), num = c("13,000", "14,000", "15,000"))
df3 <- data_frame(id = rep(3,3), num = c("16,000", "17,000", "18,000"))

list(df1, df2, df3) %>% map(~mutate(.x, num = parse_number(num)))
#> [[1]]
#> # A tibble: 3 x 2
#>      id   num
#>   <dbl> <dbl>
#> 1     1 10000
#> 2     1 11000
#> 3     1 12000
#> 
#> [[2]]
#> # A tibble: 3 x 2
#>      id   num
#>   <dbl> <dbl>
#> 1     2 13000
#> 2     2 14000
#> 3     2 15000
#> 
#> [[3]]
#> # A tibble: 3 x 2
#>      id   num
#>   <dbl> <dbl>
#> 1     3 16000
#> 2     3 17000
#> 3     3 18000

由reprex package（v0.2.0）于2018-09-20创建。

Answer 2

您的代码很接近，但是有一些问题：

结果永远不会分配回列表。
as.numerical是一个错字，必须为as.numeric
i[6]不起作用，因为您需要指定它是您想要的第六列：i[, 6]。 See here for details on [ vs [[.
c(df1, df2)实际上并未创建数据帧列表

尝试以下方法：

## this is bad, it will make a single list of columns, not of data frames
# list_of_dfs = c(df1, df2, ..... , df25)

# use this instead
list_of_dfs = list(df1, df2, ..... , df25)
# or this
list_of_dfs = mget(ls(pattern = "df"))

for (i in seq_along(list_of_dfs)) {
  list_of_dfs[[i]][, 6] <- as.numeric(gsub("[,]" , "", list_of_dfs[[i]][, 6]))
}

我们可以做得更好，gsub默认使用模式匹配正则表达式，而使用fixed = TRUE参数会更快一些：

for (i in seq_along(list_of_dfs)) {
  list_of_dfs[[i]][, 6] <- as.numeric(gsub(",", "", list_of_dfs[[i]][, 6], fixed = TRUE))
}

对于较短的代码，我们可以使用lapply而不是for循环：

list_of_dfs[[i]] <- lapply(list_of_dfs, function(x) {
    x[, 6] = as.numeric(gsub("," , "", x[, 6], fixed = TRUE))
    return(x)
})

Answer 3

部分答案来自此处：Looping through list of data frames in R

根据您的情况，您可以执行以下操作：

list_of_dfs = list(df1, df2, ..... , df25)
lapply(list_of_dfs, function(x) { x[, 6] <- as.integer(gsub("," , "", x[, 6])) })

Answer 4

数据表方式

test<-data.table(col1=c('100,00','100','100,000'),col2=c('90','80,00','60'))
    col1  col2
 100,00    90
 100      80,00
 100,000  60

您的数据帧列表

testList<-list(test,test)

假设您要在这种情况下更正col2，但要使用索引作为参考

removeNonnumeric<-function(x){return(as.numeric(gsub(',','',x)))}
data<-function(x){return(x[,lapply(.SD,removeNonnumeric),.SDcols=names(x)[2],by=col1])}

removeNonnumeirc 从列中删除“，”，并且 data 访问testList中的每个数据表并在其上调用“ removeNonnumeric”，这是数据表的列表，其中通过将这两个函数合并成一个“ lapply”来创建

 lapply(testList,data)

如何遍历多个数据集，从r中的指定列中删除特定字符

4 个答案: