我对R语言很陌生,不太清楚如何做到这一点。 如果我有一个tsv(制表符分隔文件)并通过以下内容读入表:
> table <- read.delim(file='test.tsv',sep='\t',header=TRUE,stringsAsFactors=FALSE)
id features
1. 131 FeatureA,FeatureB,FeatureC,
2. 132 FeatureA,FeatureD,FeatureE,FeatureF
3. 135 FeatureD,FeatureE,FeatureC
4. 139 FeatureF,FeatureB
我希望能够可视化功能的集群,但要在R中利用它,我需要将名为feature的列的类型更改为列表。
这样做的最佳方式是什么?
答案 0 :(得分:2)
我的“splitstackshape”软件包是为了处理这些类型的任务而编写的。您可以浏览concat.split
系列函数。
以下是一些例子:
作为list
。 (但函数对输出进行排序 - 在{em>添加选项 对输出进行排序之前),您最好使用strsplit
。
library(splitstackshape)
x1 <- concat.split.list(mydf, split.col="features", sep=",", drop = TRUE)
x1
# id features_list
# 1. 131 FeatureA, FeatureB, FeatureC
# 2. 132 FeatureA, FeatureD, FeatureE, FeatureF
# 3. 135 FeatureD, FeatureE, FeatureC
# 4. 139 FeatureF, FeatureB
str(x1)
# 'data.frame': 4 obs. of 2 variables:
# $ id : int 131 132 135 139
# $ features_list:List of 4
# ..$ : chr "FeatureA" "FeatureB" "FeatureC"
# ..$ : chr "FeatureA" "FeatureD" "FeatureE" "FeatureF"
# ..$ : chr "FeatureD" "FeatureE" "FeatureC"
# ..$ : chr "FeatureF" "FeatureB"
作为“广泛”data.frame
:
x2 <- concat.split.multiple(mydf, split.col="features", sep=",")
x2
# id features_1 features_2 features_3 features_4
# 1. 131 FeatureA FeatureB FeatureC <NA>
# 2. 132 FeatureA FeatureD FeatureE FeatureF
# 3. 135 FeatureD FeatureE FeatureC <NA>
# 4. 139 FeatureF FeatureB <NA> <NA>
作为“长”data.frame
:
x3 <- concat.split.multiple(mydf, split.cols="features", seps=",", direction="long")
x3
# id time features
# 1 131 1 FeatureA
# 2 132 1 FeatureA
# 3 135 1 FeatureD
# 4 139 1 FeatureF
# 5 131 2 FeatureB
# 6 132 2 FeatureD
# 7 135 2 FeatureE
# 8 139 2 FeatureB
# 9 131 3 FeatureC
# 10 132 3 FeatureE
# 11 135 3 FeatureC
# 12 139 3 <NA>
# 13 131 4 <NA>
# 14 132 4 FeatureF
# 15 135 4 <NA>
# 16 139 4 <NA>
正如我在评论中提到的,这是strsplit
的结果。注意提取方法。
> mydf$featuresList <- strsplit(mydf$features, ",")
> mydf
id features featuresList
1. 131 FeatureA,FeatureB,FeatureC, FeatureA, FeatureB, FeatureC
2. 132 FeatureA,FeatureD,FeatureE,FeatureF FeatureA, FeatureD, FeatureE, FeatureF
3. 135 FeatureD,FeatureE,FeatureC FeatureD, FeatureE, FeatureC
4. 139 FeatureF,FeatureB FeatureF, FeatureB
> mydf[, "featuresList"][[2]]
[1] "FeatureA" "FeatureD" "FeatureE" "FeatureF"
> mydf[, "featuresList"][[2]][2]
[1] "FeatureD"
答案 1 :(得分:2)
你可以使用strsplit:
table$list.features = strsplit(table$features,",")
您可能还想为这些功能创建指标变量:
table[unique(unlist(table$list.features))]=0
for (i in 1:nrow(table)) table[i,table$list.features[[i]]]=1