在多个列对之间用括号将数字分开

时间:2018-12-03 18:01:05

标签: r tidyr

我有一个数据集,其百分比以小数形式给出,并且在同一单元格中的括号内包含一个值。如何将两个值分成两个单独的单元格?我可以使用tidyr包中的split()吗?

示例:

Frog          Dog
.12 (0.05)   .14 (0.10)
.12 (0.04)   .44 (0.11)

dput(mydata) 

structure(list(X = structure(c(2L, 4L, 1L, 3L), .Label =  c("Blue", "Green ", "Purple", "Red"), class = "factor"), Green = structure(1:4, .Label = c("", "0.12 (0.05)", "0.14 (0.09)", "0.34 (0.05)"), class = "factor"), Red = structure(c(3L, 1L, 4L, 2L), .Label = c("", "0.12 (0.08)", "0.19 (0.05)", "0.42 (0.04)"), class = "factor"), Blue = structure(c(4L, 3L, 1L, 2L), .Label = c("", "0.1 (0.04)", "0.14 (0.04)", "0.17 (0.01)"), class = "factor"), Purple = structure(4:1, .Label = c("", "0.15 (0.08)", "0.18 (0.02)", "0.34 (0.05)"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))

2 个答案:

答案 0 :(得分:2)

如果您提供正确的separate参数,则可以sep。如果首先重塑数据,这是最简单的:

library(tidyverse)

res1 <- mydata %>% 
  gather(color, values, Green:Purple) %>% 
  separate(values, c("heritability", "p-value"), sep = ' ') %>% 
  mutate_at(vars(heritability, 'p-value'), parse_number)

请注意,使用parse_number中的readr函数可以轻松消除括号,空格和其他垃圾。由于您有一些无法拆分的空单元格,因此separate调用会发出警告。

这给出了:

        X  color heritability p-value
1  Green   Green           NA      NA
2     Red  Green         0.12    0.05
3    Blue  Green         0.14    0.09
4  Purple  Green         0.34    0.05
5  Green     Red         0.19    0.05
6     Red    Red           NA      NA
7    Blue    Red         0.42    0.04
8  Purple    Red         0.12    0.08
9  Green    Blue         0.17    0.01
10    Red   Blue         0.14    0.04
11   Blue   Blue           NA      NA
12 Purple   Blue         0.10    0.04
13 Green  Purple         0.34    0.05
14    Red Purple         0.18    0.02
15   Blue Purple         0.15    0.08
16 Purple Purple           NA      NA

我建议您保留此格式,以便进一步分析和绘图。这样很“整洁”。

如果要宽幅显示,可以重新塑形:

res1 %>% 
  gather(type, value, -X, -color) %>% 
  unite(key, color, type) %>% 
  spread(key, value)

礼物:

       X Blue_heritability Blue_p-value Green_heritability Green_p-value Purple_heritability Purple_p-value Red_heritability Red_p-value
1   Blue                NA           NA               0.14          0.09                0.15           0.08             0.42        0.04
2 Green               0.17         0.01                 NA            NA                0.34           0.05             0.19        0.05
3 Purple              0.10         0.04               0.34          0.05                  NA             NA             0.12        0.08
4    Red              0.14         0.04               0.12          0.05                0.18           0.02               NA          NA

答案 1 :(得分:1)

选项1:

library(splitstackshape)
library(tidyverse)

df %>% 
  cSplit(names(df)[-1], ' ') %>% 
  mutate_at(-1, parse_number)

#        X Green_1 Green_2 Red_1 Red_2 Blue_1 Blue_2 Purple_1 Purple_2
# 1 Green       NA      NA  0.19  0.05   0.17   0.01     0.34     0.05
# 2    Red    0.12    0.05    NA    NA   0.14   0.04     0.18     0.02
# 3   Blue    0.14    0.09  0.42  0.04     NA     NA     0.15     0.08
# 4 Purple    0.34    0.05  0.12  0.08   0.10   0.04       NA       NA

选项2 :(更糟糕的是,但需要少1个包装)

library(tidyverse)

for(col in names(df)[-1])
  df <- df %>% 
          separate(!!col, into = paste0(col, 1:2), sep = ' ')

df %>% 
  mutate_at(-1, parse_number)

#        X Green1 Green2 Red1 Red2 Blue1 Blue2 Purple1 Purple2
# 1 Green      NA     NA 0.19 0.05  0.17  0.01    0.34    0.05
# 2    Red   0.12   0.05   NA   NA  0.14  0.04    0.18    0.02
# 3   Blue   0.14   0.09 0.42 0.04    NA    NA    0.15    0.08
# 4 Purple   0.34   0.05 0.12 0.08  0.10  0.04      NA      NA

使用的数据:

df <- structure(list(X = structure(c(2L, 4L, 1L, 3L), .Label = c("Blue", "Green ", "Purple", "Red"), class = "factor"), Green = structure(1:4, .Label = c("", "0.12 (0.05)", "0.14 (0.09)", "0.34 (0.05)"), class = "factor"), Red = structure(c(3L, 1L, 4L, 2L), .Label = c("", "0.12 (0.08)", "0.19 (0.05)", "0.42 (0.04)"), class = "factor"), Blue = structure(c(4L, 3L, 1L, 2L), .Label = c("", "0.1 (0.04)", "0.14 (0.04)", "0.17 (0.01)"), class = "factor"), Purple = structure(4:1, .Label = c("", "0.15 (0.08)", "0.18 (0.02)", "0.34 (0.05)"), class = "factor")), class = "data.frame", row.names = c(NA, -4L))
df
#        X       Green         Red        Blue      Purple
# 1 Green              0.19 (0.05) 0.17 (0.01) 0.34 (0.05)
# 2    Red 0.12 (0.05)             0.14 (0.04) 0.18 (0.02)
# 3   Blue 0.14 (0.09) 0.42 (0.04)             0.15 (0.08)
# 4 Purple 0.34 (0.05) 0.12 (0.08)  0.1 (0.04)