我是一名植物学家和初学R用户。我想知道你是否可以帮我找到编写脚本的解决方案。我一直在使用R来优化从电子表格创建文本的过程。为此,我使用0 => Obj(
[id] => 100
[name] => name100
)
1 => Obj(
[id] => 200
[name] => name200
)
2 => Obj(
[id] => 50
[name] => name50
)
3 => Obj(
[id] => 3
[name] => name3
)
....
包,我很好。问题本身就是处理MonographaR
。我的电子表格(CSV文件)基本上由物种的列,字符的行组成,它们的交集单元格是字符的状态。我希望有一个最终脚本,允许我将2个或更多列组合到原始电子表格中的新列中。当单元格具有不同的内容时,新单元格内容必须将各个内容分隔为逗号+空格data.frame
。当单元格具有相同的内容时,新单元格必须仅具有相同的内容一次,而不重复它。我尝试使用连接,", "
等重复单元格内容编写的脚本,我对此并不满意。
我的初始CSV看起来像这样,
cbind
我希望得到像这样的最终结果
cattleya.minor cattleya.maxima cattleya.pumila
colour red red red
surface sharp smooth sharp
leaves 1 3 4
非常感谢你。
答案 0 :(得分:1)
正如@alistaire评论的那样,从" tidy"开始数据和事情会容易得多。
# Starting data (which I've called "dat")
dat
cattleya.minor cattleya.maxima cattleya.pumila colour red red red surface sharp smooth sharp leaves 1 3 4
library(reshape2)
library(tibble)
library(dplyr)
# Make data tidy
dat.tidy = dat %>%
rownames_to_column(var="Characteristic") %>% # Turn rownames into a data column
melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format
dcast(Species ~ Characteristic) # Cast back to wide so that each characteristic gets its own column
dat.tidy
Species colour leaves surface 1 cattleya.minor red 1 sharp 2 cattleya.maxima red 3 smooth 3 cattleya.pumila red 4 sharp
# Summarize by genus
dat.tidy %>%
group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>% # Collapse to genus (remove species designation)
summarise_all(funs(paste(unique(.), collapse=", "))) %>% # For each charactreristic, paste together each unique value for a given genus
select(-Species)
Genus colour leaves surface 1 cattleya red 1, 3, 4 sharp, smooth
答案 1 :(得分:0)
谢谢@allistaire& @ eipi10!
Eipi10,我很高兴能够接近我的目标。我完全按照你的建议和相同的数据集运行你的脚本。它工作得非常好,但它在最后一个命令块或在线select(-Species)
上看到了一些问题。你能检查一下吗? R检索了我以下内容:
> dat <- read.csv("dat.csv")
> dat
cattleya.minor cattleya.maxima cattleya.pumila
color red red red
surface sharp smooth sharp
leaves 1 3 4
>
> # Make data tidy
> dat.tidy = dat %>%
+ rownames_to_column(var="Characteristic") %>% # Turn rownames into a data column
+ melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format
+ dcast(Species ~ Characteristic) # Cast back to wide so that each characteristic gets its own column
Warning message:
attributes are not identical across measure variables; they will be dropped
>
> dat.tidy
Species color leaves surface
1 cattleya.minor red 1 sharp
2 cattleya.maxima red 3 smooth
3 cattleya.pumila red 4 sharp
>
> # Summarize by genus
> dat.tidy %>%
+ group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>% # Collapse to genus (remove species designation)
+ summarise_all(funs(paste(unique(.), collapse=", "))) # For each charactreristic, paste together each unique value for a given genus
# A tibble: 1 x 5
Genus Species color leaves surface
<chr> <chr> <chr> <chr> <chr>
1 cattleya cattleya.minor, cattleya.maxima, cattleya.pumila red 1, 3, 4 sharp, smooth
> select(-Species)
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) :
objeto 'Species' não encontrado (my free translation: object 'Species' not found)
>