我的数据集是这样的:
tweet_created_at hashtag_text
2015-05-08 05:45:30 farinaz,farkhunda,ozgecanaslan
2015-05-08 06:01:24 ozgecanaslan,sendeanlat
2015-05-08 09:51:35 ozgecanaslan,genclikyasaklanamaz
我需要将我的数据集转换为:
tweet_created_at hashtag_text
2015-05-08 05:45:30 farinaz
2015-05-08 05:45:30 farkhunda
2015-05-08 05:45:30 ozgecanaslan
2015-05-08 06:01:24 ozgecanaslan
2015-05-08 06:01:24 sendeanlat
2015-05-08 09:51:35 ozgecanaslan
2015-05-08 09:51:35 genclikyasaklanamaz
我认为我可以使用一些sapply,但我无法通过重复tweet_created_at列来解决这个问题。
答案 0 :(得分:3)
您可以从cSplit
尝试library(splitstackshape)
。我们将sep
指定为,
direction
为'long',将splitCols
指定为'hash_tag_text'以拆分列并将数据集重新整形为'long'格式。
library(splitstackshape)
cSplit(df1, 'hashtag_text', ',', 'long')
# tweet_created_at hashtag_text
#1: 2015-05-08 05:45:30 farinaz
#2: 2015-05-08 05:45:30 farkhunda
#3: 2015-05-08 05:45:30 ozgecanaslan
#4: 2015-05-08 06:01:24 ozgecanaslan
#5: 2015-05-08 06:01:24 sendeanlat
#6: 2015-05-08 09:51:35 ozgecanaslan
#7: 2015-05-08 09:51:35 genclikyasaklanamaz
df1 <- structure(list(tweet_created_at = c("2015-05-08 05:45:30",
"2015-05-08 06:01:24",
"2015-05-08 09:51:35"), hashtag_text =
c("farinaz,farkhunda,ozgecanaslan",
"ozgecanaslan,sendeanlat", "ozgecanaslan,genclikyasaklanamaz"
)), .Names = c("tweet_created_at", "hashtag_text"),
class = "data.frame", row.names = c(NA, -3L))
答案 1 :(得分:2)
使用data.table
:
library(data.table)
setDT(Womens.Rights)[,c(hashtag_text=strsplit(hashtag_text,split=",")),
by=tweet_created_at]
tweet_created_at hashtag_text
1: 2015-05-08_05:45:30 farinaz
2: 2015-05-08_05:45:30 farkhunda
3: 2015-05-08_05:45:30 ozgecanaslan
4: 2015-05-08_06:01:24 ozgecanaslan
5: 2015-05-08_06:01:24 sendeanlat
6: 2015-05-08_09:51:35 ozgecanaslan
7: 2015-05-08_09:51:35 genclikyasaklanamaz
(注意:我手动添加下划线以让read.table
读取您的数据)