我有一个非常大的数据框(想想30-50万条记录),因此我使用data.table
来解决这个问题。我对dplyr
比对data.table
更熟悉。
让我们考虑以下小例子。请注意,我的实际数据集中还有更多列。
library(data.table)
library(magrittr)
library(stringi)
set.seed(42)
format_pct <- function(x){
paste0(formatC(x * 100, digits = 1, format = 'f'), "%")
}
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5)))
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457404 259548 45.8% 94.0%
2 NA 1 468537 368294 71.9% 97.8%
3 2 NA 143070 67334 93.5% 11.7%
4 4 2 415222 328495 25.5% 47.5%
5 NA 5 320871 352530 46.2% 56.0%
我想将prettyNum()
应用于除x
,y
以及字符串'percent'
的所有列以外的所有列。
如果这个数据框不大,我会做
df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]] <-
apply(X = df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]],
MARGIN = 2,
FUN = prettyNum,
big.mark = ",")
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457,404 259,548 45.8% 94.0%
2 NA 1 468,537 368,294 71.9% 97.8%
3 2 NA 143,070 67,334 93.5% 11.7%
4 4 2 415,222 328,495 25.5% 47.5%
5 NA 5 320,871 352,530 46.2% 56.0%
现在我们假设df
为data.table
;即:
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5))) %>%
data.table(.)
有没有办法使用data.table
语法执行上述操作?
答案 0 :(得分:1)
这是一个data.table
解决方案。完全披露信用应该转到我之前密切关注的上一篇文章。 How to apply same function to every specified column in a data.table
cols<-c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")])
cols <- setdiff(colnames(df), cols)
df[ , (cols) := lapply(.SD, prettyNum, big.mark = ","), .SDcols = cols]