将R数据框列名与列

时间:2015-10-16 14:09:25

标签: r vectorization

我正在尝试创建一个函数,将每个列名与列的级别(因子)组合在一起,并返回一个带有组合名称的向量。

对于数据框 df1 <- data.frame(v1= c('a', 'b', 'b'), v2= c('b', 'b', 'c'))

它应该返回

"v1:a" "v1:b" "v2:b" "v2:c"

我想我可以在一个循环中完成这个,但是有一些矢量化解决方案可用,以防数据帧非常大吗?

2 个答案:

答案 0 :(得分:0)

根据@Ananda Mahto的建议,您可以使用expand.gridapply

apply(expand.grid(colnames(df), levels(df$A)), 1, paste, collapse=":")

# [1]  "A:a" "B:a" "A:b" "B:b" "A:c" "B:c"

答案 1 :(得分:0)

我们可以使用{ "_index": "twitter", "_type": "tweet", **"_id": "655008947099840512"**, <-- this is the real tweet id "_version": 1, "found": true, "_source": { **"id": 655008947099840500**, <-- this number comes from nowhere "createdAt": "Fri Oct 16 15:14:37 CEST 2015", "text": "tweet text(...)", "source": "Twitter for iPhone", "inReplyToStatusId": -1, "inReplyToUserId": -1, "favoriteCount": 0, "inReplyToScreenName": null, "user": "971jml", "favorited": false, "retweeted": false, "truncated": false } }

{
   "twitter":
   {
       "mappings":
       {
           "tweet":
           {
               "properties":
               {
                   "createdAt":
                   {
                       "type": "string"
                   },
                   "favoriteCount":
                   {
                       "type": "long"
                   },
                   "favorited":
                   {
                       "type": "boolean"
                   },
                   "inReplyToScreenName":
                   {
                       "type": "string"
                   },
                   "inReplyToStatusId":
                   {
                       "type": "long"
                   },
                   "inReplyToUserId":
                   {
                       "type": "long"
                   },
                   "retweeted":
                   {
                       "type": "boolean"
                   },
                   "source":
                   {
                       "type": "string"
                   },
                   "text":
                   {
                       "type": "string"
                   },
                   "truncated":
                   {
                       "type": "boolean"
                   },
                   "tweetId":
                   {
                       "type": "long"
                   },
                   "user":
                   {
                       "type": "string"
                   }
               }
           }
       }
   }
}

更新

如果列具有不同的级别,请尝试

outer

数据

c(outer(names(df), levels(df$A), FUN= paste, sep=":") )
#[1] "A:a" "B:a" "A:b" "B:b" "A:c" "B:c"