我正在尝试将列的内容分成两行,并复制行名称。每个变量只包含两个数字(11,12,13,14,21,22等或NA)。这是为了转换为STRUCTURE格式,这是一种常见的群体遗传格式。
我有这个:
population X354045 X430045 X995019
Crater <NA> 11 22
Teton 11 31 11
我想有这个:
population X354045 X430045 X995019
Crater <NA> 1 2
Crater <NA> 1 2
Teton 1 3 1
Teton 1 1 1
答案 0 :(得分:2)
这是一个data.table
问题,因此我只建议内置tstrsplit
函数
阅读您的数据
library(data.table)
DT <- fread('population X354045 X430045 X995019
Crater NA 11 22
Teton 11 31 11')
解决方案(如果您有data.frame
,请使用setDT(DT)
转换为data.table
)
DT[, lapply(.SD, function(x) unlist(tstrsplit(x, ""))), by = population]
# population X354045 X430045 X995019
# 1: Crater NA 1 2
# 2: Crater NA 1 2
# 3: Teton 1 3 1
# 4: Teton 1 1 1
答案 1 :(得分:1)
好的,所以我会这样做。让我们创建一些数据:
vector <- c(10, 11, 12, NA, 13, 14, 15)
首先,我们需要一个函数,允许您将每个两位数字分成两位数(并将NA分成两个NA):
as.numeric(sapply(vector, function(x) (x %% c(1e2,1e1)) %/% c(1e1,1e0)))
# 1 0 1 1 1 2 NA NA 1 3 1 4 1 5
现在我们所要做的就是将其应用于每个相关专栏:
DF <- data.frame(population = c("Crater", "Teton"), X354045 = c(NA, 11), X430045 = c(11, 31), X995019 = c(22, 11))
DF2 <- apply(DF[-1], 2, function(y) as.numeric(sapply(y, function(x) (x %% c(1e2,1e1)) %/% c(1e1,1e0))))
最后,我们将它与新的人口列结合起来:
population <- as.character(rep(DF$population, each = 2))
DF3 <- cbind(population, data.frame(DF2))
答案 2 :(得分:1)
dd <- read.table(header = TRUE, text = 'population X354045 X430045 X995019
Crater NA 11 22
Teton 11 31 11')
nr <- nrow(dd)
dd <- dd[rep(1:2, each = nr), ]
# population X354045 X430045 X995019
# 1 Crater NA 11 22
# 1.1 Crater NA 11 22
# 2 Teton 11 31 11
# 2.1 Teton 11 31 11
dd[, -1] <- lapply(dd[, -1], function(x) {
idx <- (seq_along(x) %% 2 == 0) + 1L
substr(x, idx, idx)
})
# population X354045 X430045 X995019
# 1 Crater <NA> 1 2
# 1.1 Crater <NA> 1 2
# 2 Teton 1 3 1
# 2.1 Teton 1 1 1
或者只是
dd <- dd[rep(1:2, each = nr), ]
dd[, -1] <- lapply(dd[, -1], function(x)
Vectorize(substr)(x, rep(1:2, nr), rep(1:2, nr)))
会起作用
感谢@DavidArenburg
在data.table
中有同样的想法
library('data.table')
dd <- read.table(header = TRUE, text = 'population X354045 X430045 X995019
Crater NA 11 22
Teton 11 31 11')
setDT(dd)[rep(1:2, each = .N), lapply(.SD, substr, 1:2, 1:2), by = population]
# population X354045 X430045 X995019
# 1: Crater NA 1 2
# 2: Crater NA 1 2
# 3: Teton 1 3 1
# 4: Teton 1 1 1
或类似地,但避免by
部分
dd <- setDT(dd)[rep(1:2, each = .N)]
dd[, 2:4 := dd[ ,lapply(.SD, substr, 1:2, 1:2), .SD = -1]]
如果您正在使用大型数据集,这应该非常快/有效