我有一个数据集,其百分比以小数形式给出,并且在同一单元格中的括号内包含一个值。如何将两个值分成两个单独的单元格?我可以使用tidyr包中的split()吗?
示例:
Frog Dog
.12 (0.05) .14 (0.10)
.12 (0.04) .44 (0.11)
。
dput(mydata)
structure(list(X = structure(c(2L, 4L, 1L, 3L), .Label = c("Blue", "Green ", "Purple", "Red"), class = "factor"), Green = structure(1:4, .Label = c("", "0.12 (0.05)", "0.14 (0.09)", "0.34 (0.05)"), class = "factor"), Red = structure(c(3L, 1L, 4L, 2L), .Label = c("", "0.12 (0.08)", "0.19 (0.05)", "0.42 (0.04)"), class = "factor"), Blue = structure(c(4L, 3L, 1L, 2L), .Label = c("", "0.1 (0.04)", "0.14 (0.04)", "0.17 (0.01)"), class = "factor"), Purple = structure(4:1, .Label = c("", "0.15 (0.08)", "0.18 (0.02)", "0.34 (0.05)"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))
答案 0 :(得分:2)
如果您提供正确的separate
参数,则可以sep
。如果首先重塑数据,这是最简单的:
library(tidyverse)
res1 <- mydata %>%
gather(color, values, Green:Purple) %>%
separate(values, c("heritability", "p-value"), sep = ' ') %>%
mutate_at(vars(heritability, 'p-value'), parse_number)
请注意,使用parse_number
中的readr
函数可以轻松消除括号,空格和其他垃圾。由于您有一些无法拆分的空单元格,因此separate
调用会发出警告。
这给出了:
X color heritability p-value 1 Green Green NA NA 2 Red Green 0.12 0.05 3 Blue Green 0.14 0.09 4 Purple Green 0.34 0.05 5 Green Red 0.19 0.05 6 Red Red NA NA 7 Blue Red 0.42 0.04 8 Purple Red 0.12 0.08 9 Green Blue 0.17 0.01 10 Red Blue 0.14 0.04 11 Blue Blue NA NA 12 Purple Blue 0.10 0.04 13 Green Purple 0.34 0.05 14 Red Purple 0.18 0.02 15 Blue Purple 0.15 0.08 16 Purple Purple NA NA
我建议您保留此格式,以便进一步分析和绘图。这样很“整洁”。
如果要宽幅显示,可以重新塑形:
res1 %>%
gather(type, value, -X, -color) %>%
unite(key, color, type) %>%
spread(key, value)
礼物:
X Blue_heritability Blue_p-value Green_heritability Green_p-value Purple_heritability Purple_p-value Red_heritability Red_p-value
1 Blue NA NA 0.14 0.09 0.15 0.08 0.42 0.04
2 Green 0.17 0.01 NA NA 0.34 0.05 0.19 0.05
3 Purple 0.10 0.04 0.34 0.05 NA NA 0.12 0.08
4 Red 0.14 0.04 0.12 0.05 0.18 0.02 NA NA
答案 1 :(得分:1)
选项1:
library(splitstackshape)
library(tidyverse)
df %>%
cSplit(names(df)[-1], ' ') %>%
mutate_at(-1, parse_number)
# X Green_1 Green_2 Red_1 Red_2 Blue_1 Blue_2 Purple_1 Purple_2
# 1 Green NA NA 0.19 0.05 0.17 0.01 0.34 0.05
# 2 Red 0.12 0.05 NA NA 0.14 0.04 0.18 0.02
# 3 Blue 0.14 0.09 0.42 0.04 NA NA 0.15 0.08
# 4 Purple 0.34 0.05 0.12 0.08 0.10 0.04 NA NA
选项2 :(更糟糕的是,但需要少1个包装)
library(tidyverse)
for(col in names(df)[-1])
df <- df %>%
separate(!!col, into = paste0(col, 1:2), sep = ' ')
df %>%
mutate_at(-1, parse_number)
# X Green1 Green2 Red1 Red2 Blue1 Blue2 Purple1 Purple2
# 1 Green NA NA 0.19 0.05 0.17 0.01 0.34 0.05
# 2 Red 0.12 0.05 NA NA 0.14 0.04 0.18 0.02
# 3 Blue 0.14 0.09 0.42 0.04 NA NA 0.15 0.08
# 4 Purple 0.34 0.05 0.12 0.08 0.10 0.04 NA NA
使用的数据:
df <- structure(list(X = structure(c(2L, 4L, 1L, 3L), .Label = c("Blue", "Green ", "Purple", "Red"), class = "factor"), Green = structure(1:4, .Label = c("", "0.12 (0.05)", "0.14 (0.09)", "0.34 (0.05)"), class = "factor"), Red = structure(c(3L, 1L, 4L, 2L), .Label = c("", "0.12 (0.08)", "0.19 (0.05)", "0.42 (0.04)"), class = "factor"), Blue = structure(c(4L, 3L, 1L, 2L), .Label = c("", "0.1 (0.04)", "0.14 (0.04)", "0.17 (0.01)"), class = "factor"), Purple = structure(4:1, .Label = c("", "0.15 (0.08)", "0.18 (0.02)", "0.34 (0.05)"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))
df
# X Green Red Blue Purple
# 1 Green 0.19 (0.05) 0.17 (0.01) 0.34 (0.05)
# 2 Red 0.12 (0.05) 0.14 (0.04) 0.18 (0.02)
# 3 Blue 0.14 (0.09) 0.42 (0.04) 0.15 (0.08)
# 4 Purple 0.34 (0.05) 0.12 (0.08) 0.1 (0.04)