重塑R中的数据

时间:2013-04-26 11:51:19

标签: r reshape

我的数据集格式如下。我试图用R中的reshape2包来做到这一点,但它给出了一个不合适的格式(所有页面的二进制变量)。是否有任何方法可以重建所需格式的数据集,如下所示。

Input format:
User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html

Output format:
User    page1       page2         page3         page4         page5
1       index.html  search.html   help.html     contact.html  NA
2       help.html   contact.html  NA            NA            NA
3       index.html  search.html   feedback.html NA            NA

2 个答案:

答案 0 :(得分:9)

使用reshape2包中的函数dcast

library(reshape2)

txt <- "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html"

mydf <- read.table(text=txt, header=TRUE)

#creating a new column to count the page number:
mydf$page <- paste("Page", unlist((sapply(table(mydf$User), seq))))  

new.df <- dcast( mydf, User ~ page, value.var="Pages") #here the magic happens. 

> print(new.df)
   User     Page 1       Page 2        Page 3       Page 4
1    1 index.html  search.html     help.html contact.html
2    2  help.html contact.html          <NA>         <NA>
3    3 index.html  search.html feedback.html         <NA>

答案 1 :(得分:2)

合并@ zelite的惊人的黑名单技巧

x <- read.table( text = "User    Pages
1   index.html
1   search.html
1   help.html
1   contact.html
2   help.html
2   contact.html
3   index.html
3   search.html
3   feedback.html", h=T)

library(reshape2)

x$tv <- unlist((sapply(table(x$User), seq)))

reshape( x , idvar = 'User' , timevar = 'tv' , direction = 'wide' )