将矩阵值转换为具有给定值的单元格数

时间:2016-01-21 16:22:44

标签: r matrix converter

我有一个包含5列的data.frame,每列包含整个比例。这是它的样子:

Sample    Type_A    Type_B    Type_C    Type_D    Type_E    Sum
00001      54        13         24        3          6      100
00002      5         2          15        54        24      100
00003      10        10         23        37        20      100

我想创建一个100列matrix,并在data.frame中填充与其值相称的单元格。行00001在前50个单元格中看起来为A,然后在其中包含B的13个单元格,其中包含C的24个单元格等等。

所需的矩阵看起来像这样:

00001  A  A  A  A  A  A  A  A  A  A  A  A  A  A .....
00002  A  A  A  A  A  B  B  C  C  C  C  C  C  C .....
00003  A  A  A  A  A  A  A  A  A  A  B  B  B  B .....

4 个答案:

答案 0 :(得分:3)

这是data.table的另一个选项(假设" Type"列中的值总和为所有行的100)。

library(data.table)
nm1 <- sub(".*_", "", grep("_", names(df1), value=TRUE))
setDT(df1)[, transpose(list(rep(nm1, unlist(.SD)))),
    by = Sample ,.SDcols = Type_A:Type_E]
# Sample V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48
#1:  00001  A  A  A  A  A  A  A  A  A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
#2:  00002  A  A  A  A  A  B  B  C  C   C   C   C   C   C   C   C   C   C   C   C   C   C   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D
#3:  00003  A  A  A  A  A  A  A  A  A   A   B   B   B   B   B   B   B   B   B   B   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   D   D   D   D   D
#   V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70 V71 V72 V73 V74 V75 V76 V77 V78 V79 V80 V81 V82 V83 V84 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95
#1:   A   A   A   A   A   A   B   B   B   B   B   B   B   B   B   B   B   B   B   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   D   D   D   E
#2:   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E
#3:   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   D   E   E   E   E   E   E   E   E   E   E   E   E   E   E   E
#   V96 V97 V98 V99 V100
#1:   E   E   E   E    E
#2:   E   E   E   E    E
#3:   E   E   E   E    E

答案 1 :(得分:2)

请注意,您的第一个样本最多不超过100但是96.为了示例,我将使用54.

尝试rep

rep(c("A","B","C","D","E"),c(54,13,24,3,6))

# "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
# "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "B" "B" "B"
# "B" "B" "B" "B" "B" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "D" "D"
# "D" "E" "E" "E" "E" "E" "E"

对于你的数据框,我会做这样的事情(但可能用更少的代码完成):

# Some preparation
df2 <- df[,2:(ncol(df)-1)] # selecting just the types
names(df2) <- gsub("Type_", "", names(df2)) # Removing "Type_" from the variable names

# Apply rep to all rows
lis <- apply(df2,1,function(x) rep(names(df2),x))
t(as.matrix(lis))

答案 2 :(得分:1)

我有一个快速的hacky解决方案,如果可以的话。首先,我制作一些与您提供的数据半匹配的假数据。

library(plyr)
dat <-  matrix(c(50,14,24,12, 50,50,0,0), ncol=4, byrow=TRUE)
colnames(dat) <- paste('Type_', LETTERS[1:4], sep='')

然后我使用一个非常笨重的strsplit命令从colnames中取出字母,并使用apply语句根据rep字母中的值adply(data,1,function(x){ nms <- unlist(lapply(strsplit(colnames(dat), '_'), function(x)x[2])) rep(nms, x)})[,-1] 细胞。请注意,如果您的行总和不是100,它将无效。

is a one-way 'MessageHandler' and it isn't appropriate to configure 'outputChannel'. This is the end of the integration flow.

答案 3 :(得分:1)

这是dplyrtidyr解决方案。可能有一种更清洁的方式来处理这个

### Vectorize "rep"
vec_rep <- function(x,y) {
    unlist(lapply(1:length(x), function(z) { paste(rep(x[z], y[z]), collapse = '') }))
}

df2 <-
    df %>%
    select(-Sum)                                    %>% # Col not needed
    gather(Type, TypeVal, -Sample)                  %>% # Reshape data to long format
    mutate(tstr = vec_rep(gsub('^[^_]+_','', Type), TypeVal)) %>% # create strings of desired lengths
    arrange(Sample, Type)                           %>% # Sort
    group_by(Sample)                                %>% # 
    summarise(NewVal = paste(tstr, collapse=''))        # Create desired string based on grouping

df2是一个可以转换为矩阵的数据框。