Question

我有一个数据框，用于提取在论坛上发布的消息线索。通过连接数据库中的表，我得到一个如下所示的结构：

threadStarterName1    threadstarter1    comment1    commenterName1
threadStarterName1    threadstarter1    comment2    commenterName2
threadStarterName1    threadstarter1    comment3    commenterName3
threadStarterName1    threadstarter1    comment4    commenterName4
threadStarterName1    threadstarter1    comment5    commenterName5

创建此数据框的代码：

      df=data.frame("threadStarterName"=c("threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1","threadStarterName1"),
"threadStarter"=c("threadStarter1","threadStarter1","threadStarter1","threadStarter1","threadStarter1"),
"comment"=c("comment1","comment2","comment3","comment4","comment5"),
"commenterName"=c("commenterName1","commenterName2","commenterName3","commenterName4","commenterName5"))

我想重新格式化此数据框以提取如下值，然后我可以在R-markdown中打印报告：

threadstarter1    threadStarterName1
   comment1       commenterName1
   comment2       commenterName2
   comment3       commenterName3
   comment4       commenterName4
   comment5       commenterName5

提前致谢！

Answer 1

如果我理解正确，原始帖子（和它的作者）会在每一行上重复，而你希望它们只出现一次，与评论内容和评论作者在同一列中。

如果是这样，应该这样做：

onlyOnce <-
  data.frame(
    user = c(df$threadStarterName[1]
             , df$commenterName)
    , commentPosted = c(df$threadStarter[1]
                        , df$comment)
  )

它需要第一个线程作者条目（和他们的帖子）并将其置于评论作者（及其评论）之上的顶部。

R提取数据帧值以便在R-markdown中打印

1 个答案: