我有一个数据框,用于提取在论坛上发布的消息线索。通过连接数据库中的表,我得到一个如下所示的结构:
threadStarterName1 threadstarter1 comment1 commenterName1
threadStarterName1 threadstarter1 comment2 commenterName2
threadStarterName1 threadstarter1 comment3 commenterName3
threadStarterName1 threadstarter1 comment4 commenterName4
threadStarterName1 threadstarter1 comment5 commenterName5
创建此数据框的代码:
df=data.frame("threadStarterName"=c("threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1"),
"threadStarter"=c("threadStarter1","threadStarter1","threadStarter1","threadStarter1","threadStarter1"),
"comment"=c("comment1","comment2","comment3","comment4","comment5"),
"commenterName"=c("commenterName1","commenterName2","commenterName3","commenterName4","commenterName5"))
我想重新格式化此数据框以提取如下值,然后我可以在R-markdown中打印报告:
threadstarter1 threadStarterName1
comment1 commenterName1
comment2 commenterName2
comment3 commenterName3
comment4 commenterName4
comment5 commenterName5
提前致谢!
答案 0 :(得分:0)
如果我理解正确,原始帖子(和它的作者)会在每一行上重复,而你希望它们只出现一次,与评论内容和评论作者在同一列中。
如果是这样,应该这样做:
onlyOnce <-
data.frame(
user = c(df$threadStarterName[1]
, df$commenterName)
, commentPosted = c(df$threadStarter[1]
, df$comment)
)
它需要第一个线程作者条目(和他们的帖子)并将其置于评论作者(及其评论)之上的顶部。