使用R - 将多个柱冷凝成新柱而不重复内容

时间:2016-08-12 02:09:12

标签: r csv dataframe concatenation

我是一名植物学家和初学R用户。我想知道你是否可以帮我找到编写脚本的解决方案。我一直在使用R来优化从电子表格创建文本的过程。为此,我使用0 => Obj( [id] => 100 [name] => name100 ) 1 => Obj( [id] => 200 [name] => name200 ) 2 => Obj( [id] => 50 [name] => name50 ) 3 => Obj( [id] => 3 [name] => name3 ) .... 包,我很好。问题本身就是处理MonographaR。我的电子表格(CSV文件)基本上由物种的列,字符的行组成,它们的交集单元格是字符的状态。我希望有一个最终脚本,允许我将2个或更多列组合到原始电子表格中的新列中。当单元格具有不同的内容时,新单元格内容必须将各个内容分隔为逗号+空格data.frame。当单元格具有相同的内容时,新单元格必须仅具有相同的内容一次,而不重复它。我尝试使用连接,", "等重复单元格内容编写的脚本,我对此并不满意。

我的初始CSV看起来像这样,

cbind

我希望得到像这样的最终结果

        cattleya.minor cattleya.maxima cattleya.pumila
colour  red            red             red
surface sharp          smooth          sharp
leaves  1              3               4

非常感谢你。

2 个答案:

答案 0 :(得分:1)

正如@alistaire评论的那样,从" tidy"开始数据和事情会容易得多。

# Starting data (which I've called "dat")
dat
        cattleya.minor cattleya.maxima cattleya.pumila
colour             red             red             red
surface          sharp          smooth           sharp
leaves               1               3               4
library(reshape2)
library(tibble)
library(dplyr)

# Make data tidy
dat.tidy = dat %>% 
  rownames_to_column(var="Characteristic") %>%                # Turn rownames into a data column
  melt(id.var="Characteristic", variable.name="Species") %>%  # Reshape to "long" format
  dcast(Species ~ Characteristic)                             # Cast back to wide so that each characteristic gets its own column

dat.tidy    
          Species colour leaves surface
1  cattleya.minor    red      1   sharp
2 cattleya.maxima    red      3  smooth
3 cattleya.pumila    red      4   sharp
# Summarize by genus
dat.tidy %>%
  group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>%       # Collapse to genus (remove species designation)
  summarise_all(funs(paste(unique(.), collapse=", "))) %>%  # For each charactreristic, paste together each unique value for a given genus
  select(-Species)
     Genus colour  leaves       surface
1 cattleya    red 1, 3, 4 sharp, smooth

答案 1 :(得分:0)

谢谢@allistaire& @ eipi10!

Eipi10,我很高兴能够接近我的目标。我完全按照你的建议和相同的数据集运行你的脚本。它工作得非常好,但它在最后一个命令块或在线select(-Species)上看到了一些问题。你能检查一下吗? R检索了我以下内容:

> dat <- read.csv("dat.csv")
> dat
        cattleya.minor cattleya.maxima cattleya.pumila
color              red             red             red
surface          sharp          smooth           sharp
leaves               1               3               4
> 
> # Make data tidy
> dat.tidy = dat %>% 
+   rownames_to_column(var="Characteristic") %>%                # Turn     rownames into a data column
+   melt(id.var="Characteristic", variable.name="Species") %>%  # Reshape to "long" format
+   dcast(Species ~ Characteristic)                             # Cast back to wide so that each characteristic gets its own column
Warning message:
attributes are not identical across measure variables; they will be dropped 
> 
> dat.tidy
          Species color leaves surface
1  cattleya.minor   red      1   sharp
2 cattleya.maxima   red      3  smooth
3 cattleya.pumila   red      4   sharp
> 
> # Summarize by genus
> dat.tidy %>%
+   group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>%   # Collapse to genus (remove species designation)
+   summarise_all(funs(paste(unique(.), collapse=", ")))  # For each charactreristic, paste together each unique value for a given genus
# A tibble: 1 x 5
     Genus                                          Species color  leaves           surface
     <chr>                                            <chr> <chr>   <chr>         <chr>
1 cattleya cattleya.minor, cattleya.maxima, cattleya.pumila   red 1, 3, 4 sharp, smooth
>   select(-Species)
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) : 
  objeto 'Species' não encontrado (my free translation: object 'Species' not found)
>