Question

我有一个非常大的数据框（想想30-50万条记录），因此我使用data.table来解决这个问题。我对dplyr比对data.table更熟悉。

让我们考虑以下小例子。请注意，我的实际数据集中还有更多列。

library(data.table)
library(magrittr)
library(stringi)

set.seed(42)

format_pct <- function(x){
  paste0(formatC(x * 100, digits = 1, format = 'f'), "%")
}

df <- data.frame(x = c(1, NA, 2, 4, NA),
                 y = c(0, 1, NA, 2, 5),
                 huge_numsf = sample.int(500000:1000000, size = 5),
                 huge_numsg = sample.int(500000:1000000, size = 5),
                 percent_a = format_pct(runif(5)),
                 percent_b = format_pct(runif(5)))

> df
   x  y huge_numsf huge_numsg percent_a percent_b
1  1  0     457404     259548     45.8%     94.0%
2 NA  1     468537     368294     71.9%     97.8%
3  2 NA     143070      67334     93.5%     11.7%
4  4  2     415222     328495     25.5%     47.5%
5 NA  5     320871     352530     46.2%     56.0%

我想将prettyNum()应用于除x，y以及字符串'percent'的所有列以外的所有列。

如果这个数据框不大，我会做

df[,colnames(df)[
  !(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
  ]] <-  
  apply(X = df[,colnames(df)[
    !(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
    ]],
    MARGIN = 2,
    FUN = prettyNum,
    big.mark = ",")

> df
   x  y huge_numsf huge_numsg percent_a percent_b
1  1  0    457,404    259,548     45.8%     94.0%
2 NA  1    468,537    368,294     71.9%     97.8%
3  2 NA    143,070     67,334     93.5%     11.7%
4  4  2    415,222    328,495     25.5%     47.5%
5 NA  5    320,871    352,530     46.2%     56.0%

现在我们假设df为data.table;即：

df <- data.frame(x = c(1, NA, 2, 4, NA),
                 y = c(0, 1, NA, 2, 5),
                 huge_numsf = sample.int(500000:1000000, size = 5),
                 huge_numsg = sample.int(500000:1000000, size = 5),
                 percent_a = format_pct(runif(5)),
                 percent_b = format_pct(runif(5))) %>% 
  data.table(.)

有没有办法使用data.table语法执行上述操作？

Answer 1

这是一个data.table解决方案。完全披露信用应该转到我之前密切关注的上一篇文章。 How to apply same function to every specified column in a data.table

cols<-c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")])
cols <- setdiff(colnames(df), cols)
df[ , (cols) := lapply(.SD, prettyNum, big.mark = ","), .SDcols = cols]

data.table：prettyNum（）除了一些列

1 个答案: