我有一个数据框,其中有一个ID列和多个包含密度测量值的数字列。为了使密度呈正态分布,我需要取对数,但是因为我有0个密度值,所以我需要对所有密度测量值增加0.5,以便在对数变换时不获取Inf数据点。我该如何使用dplyr?
样本数据:
ID `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IM_10 NA 608. 755. 51.0 868. 1066.
2 IM_1… NA 27.5 69.3 0.550 30.4 75.2
3 IM_1… NA 19.6 17.0 1.03 53.2 42.0
4 IM_1… NA 109. 89.0 47.7 725. 594.
5 IM_1… NA 219. 171. 0.501 531. 416.
6 IM_1… NA 4.00 0 0 5.94 0
我尝试使用
df1 <- df %>% group_by(ID) %>%
summarise_all(funs(mean(., na.rm=TRUE))) %>%
mutate_at(which(sapply(., is.numeric)), funs(sum(0.5)))
但是这会将我的所有数字列替换为0.5,而不是将0.5增加到原始密度。
ID `Image Tag` `CD3 Global Den… `CD8 Global Den… `CD20 Global De… `CD3 Tumour Den… `CD8 Tumour Den…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IM_10 0.5 0.5 0.5 0.5 0.5 0.5
2 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
3 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
4 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
5 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
6 IM_1… 0.5 0.5 0.5 0.5 0.5 0.5
任何想法如何做到这一点?
答案 0 :(得分:0)
我假设您要汇总每个ID,然后将0.5添加到每个值(不是NA)。然后这就是我要怎么做:
# Sample data
df <- structure(list(ID = c("IM_10", "IM_11", "IM_12", "IM_13", "IM_14",
"IM_15"), Image_Tag = c(NA, NA, NA, NA, NA, NA), CD3_Global_Den = c(608,
27.5, 19.6, 109, 219, 4), CD8_Global_Den = c(755, 69.3, 17, 89,
171, 0), CD20_Global_De = c(51, 0.55, 1.03, 47.7, 0.501, 0),
CD3_Tumour_Den = c(868, 30.4, 53.2, 725, 531, 5.94), CD8_Tumour_Den = c(1066,
75.2, 42, 594, 416, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("ID", "Image_Tag", "CD3_Global_Den",
"CD8_Global_Den", "CD20_Global_De", "CD3_Tumour_Den", "CD8_Tumour_Den"
))
# Suggested code
library(hablar)
library(dplyr)
options(pillar.sigfig = 6)
df %>% group_by(ID) %>%
summarise_all(~mean_(.)) %>%
mutate_at(vars(-ID), ~. + 0.5)
给出结果:
# A tibble: 6 x 7
ID Image_Tag CD3_Global_Den CD8_Global_Den CD20_Global_De CD3_Tumour_Den CD8_Tumour_Den
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 IM_10 NA 608.5 755.5 51.5 868.5 1066.5
2 IM_11 NA 28 69.8 1.05 30.9 75.7
3 IM_12 NA 20.1 17.5 1.53 53.7 42.5
4 IM_13 NA 109.5 89.5 48.2 725.5 594.5
5 IM_14 NA 219.5 171.5 1.00100 531.5 416.5
6 IM_15 NA 4.5 0.5 0.5 6.44 0.5
答案 1 :(得分:0)
如果你只想添加一个
df%>% map_if(is.numeric, ~.+1)