按组拆分列

时间:2015-08-03 19:12:42

标签: r split dataframe

我的数据集是这样的:

tweet_created_at                              hashtag_text
2015-05-08 05:45:30                           farinaz,farkhunda,ozgecanaslan
2015-05-08 06:01:24                           ozgecanaslan,sendeanlat
2015-05-08 09:51:35                           ozgecanaslan,genclikyasaklanamaz

我需要将我的数据集转换为:

tweet_created_at                              hashtag_text
2015-05-08 05:45:30                           farinaz
2015-05-08 05:45:30                           farkhunda
2015-05-08 05:45:30                           ozgecanaslan
2015-05-08 06:01:24                           ozgecanaslan
2015-05-08 06:01:24                           sendeanlat
2015-05-08 09:51:35                           ozgecanaslan
2015-05-08 09:51:35                           genclikyasaklanamaz

我认为我可以使用一些sapply,但我无法通过重复tweet_created_at列来解决这个问题。

2 个答案:

答案 0 :(得分:3)

您可以从cSplit尝试library(splitstackshape)。我们将sep指定为, direction为'long',将splitCols指定为'hash_tag_text'以拆分列并将数据集重新整形为'long'格式。

 library(splitstackshape)
 cSplit(df1, 'hashtag_text', ',', 'long')
 #      tweet_created_at        hashtag_text
 #1: 2015-05-08 05:45:30             farinaz
 #2: 2015-05-08 05:45:30           farkhunda
 #3: 2015-05-08 05:45:30        ozgecanaslan
 #4: 2015-05-08 06:01:24        ozgecanaslan
 #5: 2015-05-08 06:01:24          sendeanlat
 #6: 2015-05-08 09:51:35        ozgecanaslan
 #7: 2015-05-08 09:51:35 genclikyasaklanamaz

数据

 df1 <- structure(list(tweet_created_at = c("2015-05-08 05:45:30", 
 "2015-05-08 06:01:24", 
 "2015-05-08 09:51:35"), hashtag_text =   
 c("farinaz,farkhunda,ozgecanaslan", 
 "ozgecanaslan,sendeanlat", "ozgecanaslan,genclikyasaklanamaz"
 )), .Names = c("tweet_created_at", "hashtag_text"),
 class = "data.frame", row.names = c(NA, -3L))

答案 1 :(得分:2)

使用data.table

library(data.table)
setDT(Womens.Rights)[,c(hashtag_text=strsplit(hashtag_text,split=",")),
                     by=tweet_created_at]
      tweet_created_at        hashtag_text
1: 2015-05-08_05:45:30             farinaz
2: 2015-05-08_05:45:30           farkhunda
3: 2015-05-08_05:45:30        ozgecanaslan
4: 2015-05-08_06:01:24        ozgecanaslan
5: 2015-05-08_06:01:24          sendeanlat
6: 2015-05-08_09:51:35        ozgecanaslan
7: 2015-05-08_09:51:35 genclikyasaklanamaz

(注意:我手动添加下划线以让read.table读取您的数据)