将逗号分隔的列转换为R中的列表

时间:2013-12-11 06:14:17

标签: r data-visualization tsv

我对R语言很陌生,不太清楚如何做到这一点。 如果我有一个tsv(制表符分隔文件)并通过以下内容读入表:

> table <- read.delim(file='test.tsv',sep='\t',header=TRUE,stringsAsFactors=FALSE)

    id              features
1. 131  FeatureA,FeatureB,FeatureC,
2. 132  FeatureA,FeatureD,FeatureE,FeatureF
3. 135  FeatureD,FeatureE,FeatureC
4. 139  FeatureF,FeatureB

我希望能够可视化功能的集群,但要在R中利用它,我需要将名为feature的列的类型更改为列表。

这样做的最佳方式是什么?

2 个答案:

答案 0 :(得分:2)

我的“splitstackshape”软件包是为了处理这些类型的任务而编写的。您可以浏览concat.split系列函数。

以下是一些例子:

作为list。 (但函数对输出进行排序 - 在{em>添加选项 对输出进行排序之前),您最好使用strsplit

library(splitstackshape)
x1 <- concat.split.list(mydf, split.col="features", sep=",", drop = TRUE)
x1
#     id                          features_list
# 1. 131           FeatureA, FeatureB, FeatureC
# 2. 132 FeatureA, FeatureD, FeatureE, FeatureF
# 3. 135           FeatureD, FeatureE, FeatureC
# 4. 139                     FeatureF, FeatureB
str(x1)
# 'data.frame':  4 obs. of  2 variables:
#  $ id           : int  131 132 135 139
#  $ features_list:List of 4
#   ..$ : chr  "FeatureA" "FeatureB" "FeatureC"
#   ..$ : chr  "FeatureA" "FeatureD" "FeatureE" "FeatureF"
#   ..$ : chr  "FeatureD" "FeatureE" "FeatureC"
#   ..$ : chr  "FeatureF" "FeatureB"

作为“广泛”data.frame

x2 <- concat.split.multiple(mydf, split.col="features", sep=",")
x2
#     id features_1 features_2 features_3 features_4
# 1. 131   FeatureA   FeatureB   FeatureC       <NA>
# 2. 132   FeatureA   FeatureD   FeatureE   FeatureF
# 3. 135   FeatureD   FeatureE   FeatureC       <NA>
# 4. 139   FeatureF   FeatureB       <NA>       <NA>

作为“长”data.frame

x3 <- concat.split.multiple(mydf, split.cols="features", seps=",", direction="long")
x3
#     id time features
# 1  131    1 FeatureA
# 2  132    1 FeatureA
# 3  135    1 FeatureD
# 4  139    1 FeatureF
# 5  131    2 FeatureB
# 6  132    2 FeatureD
# 7  135    2 FeatureE
# 8  139    2 FeatureB
# 9  131    3 FeatureC
# 10 132    3 FeatureE
# 11 135    3 FeatureC
# 12 139    3     <NA>
# 13 131    4     <NA>
# 14 132    4 FeatureF
# 15 135    4     <NA>
# 16 139    4     <NA>

根据您的评论更新:

正如我在评论中提到的,这是strsplit的结果。注意提取方法。

> mydf$featuresList <- strsplit(mydf$features, ",")
> mydf
    id                            features                           featuresList
1. 131         FeatureA,FeatureB,FeatureC,           FeatureA, FeatureB, FeatureC
2. 132 FeatureA,FeatureD,FeatureE,FeatureF FeatureA, FeatureD, FeatureE, FeatureF
3. 135          FeatureD,FeatureE,FeatureC           FeatureD, FeatureE, FeatureC
4. 139                   FeatureF,FeatureB                     FeatureF, FeatureB
> mydf[, "featuresList"][[2]]
[1] "FeatureA" "FeatureD" "FeatureE" "FeatureF"
> mydf[, "featuresList"][[2]][2]
[1] "FeatureD"

答案 1 :(得分:2)

你可以使用strsplit:

table$list.features = strsplit(table$features,",")

您可能还想为这些功能创建指标变量:

table[unique(unlist(table$list.features))]=0
for (i in 1:nrow(table)) table[i,table$list.features[[i]]]=1