我的数据集格式如下。我试图用R中的reshape2包来做到这一点,但它给出了一个不合适的格式(所有页面的二进制变量)。是否有任何方法可以重建所需格式的数据集,如下所示。
Input format:
User Pages
1 index.html
1 search.html
1 help.html
1 contact.html
2 help.html
2 contact.html
3 index.html
3 search.html
3 feedback.html
Output format:
User page1 page2 page3 page4 page5
1 index.html search.html help.html contact.html NA
2 help.html contact.html NA NA NA
3 index.html search.html feedback.html NA NA
答案 0 :(得分:9)
使用reshape2包中的函数dcast
:
library(reshape2)
txt <- "User Pages
1 index.html
1 search.html
1 help.html
1 contact.html
2 help.html
2 contact.html
3 index.html
3 search.html
3 feedback.html"
mydf <- read.table(text=txt, header=TRUE)
#creating a new column to count the page number:
mydf$page <- paste("Page", unlist((sapply(table(mydf$User), seq))))
new.df <- dcast( mydf, User ~ page, value.var="Pages") #here the magic happens.
> print(new.df)
User Page 1 Page 2 Page 3 Page 4
1 1 index.html search.html help.html contact.html
2 2 help.html contact.html <NA> <NA>
3 3 index.html search.html feedback.html <NA>
答案 1 :(得分:2)
合并@ zelite的惊人的黑名单技巧
x <- read.table( text = "User Pages
1 index.html
1 search.html
1 help.html
1 contact.html
2 help.html
2 contact.html
3 index.html
3 search.html
3 feedback.html", h=T)
library(reshape2)
x$tv <- unlist((sapply(table(x$User), seq)))
reshape( x , idvar = 'User' , timevar = 'tv' , direction = 'wide' )