我现在已经看了很长一段时间,但似乎无法解决这个问题,虽然我觉得这应该很容易。
我有54个因素包含不同数量的字符串,准确的路径名称。例如,以下是包含它们的元素的两个因素:
> PWe1
[1] Gene_Expression
[2] miR-targeted_genes_in_muscle_cell_-_TarBase
[3] Generic_Transcription_Pathway
> PWe2
[1] miR-targeted_genes_in_epithelium_-_TarBase
[2] miR-targeted_genes_in_leukocytes_-_TarBase
[3] miR-targeted_genes_in_lymphocytes_-_TarBase
[4] miR-targeted_genes_in_muscle_cell_-_TarBase
我想做的是将它们组合成一个包含54列的大数据框,其中每列都有一个相应因子的名称。我已经尝试了cbind,cbind.data.frame和其他几个选项,但这些选项返回数值而不是字符串。
预期产出:
PWe1 PWe2
Gene_Expression miR-targeted_genes_in_epithelium_-_TarBase
miR-targeted_genes_in_muscle_cell_-_TarBase miR-targeted_genes_in_leukocytes_-_TarBase
Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
NA miR-targeted_genes_in_muscle_cell_-_TarBase
对于R来说,我是一个相当初学者,是否有人可以推动我寻求可能的解决方案?
提前致谢!
答案 0 :(得分:2)
lst <- mget(ls(pattern="PW")) #<--- Create list with all necessary vectors.
ind <- lengths(lst) #<--- find maximum length
as.data.frame(do.call(cbind,
lapply(lst, `length<-`, max(ind)))) #<--- Convert to data.frmae
# PWe1 PWe2
# 1 Gene_Expression miR-targeted_genes_in_epithelium_-_TarBase
# 2 miR-targeted_genes_in_muscle_cell_-_TarBase miR-targeted_genes_in_leukocytes_-_TarBase
# 3 Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
# 4 <NA> miR-targeted_genes_in_muscle_cell_-_TarBase
答案 1 :(得分:1)
如果在使用cbind之前将因子转换为字符,则不会获得数值:
testFrame <- data.frame(cbind(as.character(PWe1), as.character(PWe3))
如果两个向量的长度不同,则cbind会发出警告,并且将复制较短向量的元素。如果您的情况不满意,可能data.frame对象可能不是正确的选择?
答案 2 :(得分:1)
l1 <- max(length(v1), length(v2))
length(v1) <- l1
length(v2) <- l1
cbind(as.character(v1), as.character(v2))
# [,1] [,2]
#[1,] "Gene_Expression" "miR-#targeted_genes_in_epithelium_-_TarBase"
#[2,] "miR-targeted_genes_in_muscle_cell_-_TarBase" "miR-#targeted_genes_in_leukocytes_-_TarBase"
#[3,] "Generic_Transcription_Pathway" "miR-#targeted_genes_in_lymphocytes_-_TarBase"
#[4,] NA "miR-#targeted_genes_in_muscle_cell_-_TarBase"