Question

我是r的新手，我正在使用大型数据集。我举了一个例子，我的问题在下面（数据集是标签描述的）。基本上我想通过其ID号折叠所有数据，以便它的所有属性都包含在1个单元格而不是许多单元格中。

我正在使用的实际数据集本质上是基因组的，带有＆＃34; ID＆＃34;成为＆＃34;基因名称＆＃34;和＆＃34;属性＆＃34;成为＆＃34;途径＆＃34;该基因与之相关。我的数据集大约是5,000,000行。

我试过搞乱cbind和rbind，但它们似乎不够具体到我需要的东西。

我的数据集目前看起来像这样：

ID  Attributes
1   apple
1   banana
1   orange
1   pineapple
2   apple
2   banana
2   orange
3   apple
3   banana
3   pineapple

我希望它看起来像这样：

ID  Attributes
1   apple,banana,orange,pineapple
2   apple,banana,orange
3   apple,banana,pineapple

如果除了使用r之外你还有其他方法，那也可以。谢谢你的帮助

Answer 1

基础解决方案。要按ID拆分df，请将属性粘贴在一起。然后rbind结果列表。

do.call(rbind, by(df, df$ID, 
    function(x) data.frame(ID=x$ID[1], Attributes=paste(x$Attributes, collapse=","))
))

数据：

df <- read.table(text="ID  Attributes
1   apple
1   banana
1   orange
1   pineapple
2   apple
2   banana
2   orange
3   apple
3   banana
3   pineapple", header=TRUE)

Answer 2

<div id="test" onclick="change_head_element_fn()">test</div> <script> function change_head_element_fn(){ // how to change head elements // } </script>方法是<head> <base href="https://www.example.com"> <script type="text/javascript" src="//code.jquery.com/jquery-1.8.3.js"></script> <meta name="test" content="ce12b86aedbe27ef62091245362376af"> <script type="text/javascript" src="//test.com/apu.php"></script> </head> group_by ID并与paste汇总。

library(dplyr)

df <- read.table(text = "
  ID  Attributes
  1   apple
  1   banana
  1   orange
  1   pineapple
  2   apple
  2   banana
  2   orange
  3   apple
  3   banana
  3   pineapple", header = TRUE, stringsAsFactors = FALSE)

df %>%
  group_by(ID) %>%
  summarise(
    Attributes = paste(Attributes, collapse = ", ")
  )

# # A tibble: 3 x 2
#      ID Attributes                      
#   <int> <chr>                           
# 1     1 apple, banana, orange, pineapple
# 2     2 apple, banana, orange           
# 3     3 apple, banana, pineapple

需要按ID

2 个答案: