Question

tidyr的文档表明收集和传播是可传递的，但下面的例子是＆＃34; iris＆＃34;数据显示它们不是，但目前尚不清楚原因。任何澄清将不胜感激

iris.df = as.data.frame(iris)
long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species)
w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species)

我期待数据框＆＃34; w.iris.df＆＃34;与＆＃34; iris.df＆＃34;相同但收到了以下错误：

＆＃34;错误：行的重复标识符（1,2,3,4,5,6,7,8,9 ......＆＃34;

我的一般问题是如何撤销＆＃34;收集＆＃34;在这种数据集上。

Answer 1

Hadley的干预并不令人惊讶地完美......但是我在结束之后最终弄乱了语法...所以为了它的价值，我发布完全可操作的代码（抱歉我的语法与上面有点不同）：

library(tidyr)
library(dplyr)

wide <- 
  iris %>%
  mutate(row = row_number()) %>%
  gather(vars, val, -Species, -row) %>%
  spread(vars, val)

head(wide)
#   Species row Petal.Length Petal.Width Sepal.Length Sepal.Width
# 1  setosa   1          1.4         0.2          5.1         3.5
# 2  setosa   2          1.4         0.2          4.9         3.0
# 3  setosa   3          1.3         0.2          4.7         3.2
# 4  setosa   4          1.5         0.2          4.6         3.1
# 5  setosa   5          1.4         0.2          5.0         3.6
# 6  setosa   6          1.7         0.4          5.4         3.9

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

它们是相同的....如果你觉得它需要重新排序......

wide <- wide[,c(3, 4, 5, 6, 1)]  ## Reorder and then remove "row" column

并完成。

Answer 2

由于先前的答案可能还不够清楚，因此您在执行import matlab.engine eng = matlab.engine.start_matlab() #Define A B = eng.dmperm(eng.double(A)) #Apply MATLABs dmperm时表现出gather的方式会出现问题。

问题在于，在收集过程中，您无法跟踪哪个spread属于原始数据帧的哪一行，因此feature.measure不知道如何将各个值组合到“宽”表。

spread

现在iris.df = as.data.frame(iris) long.iris.df = iris.df %>% tibble::rowid_to_column() %>% gather(key = feature.measure, value = size, -Species, -rowid) #> rowid Species feature.measure size #> 1 1 setosa Sepal.Length 5.1 #> 2 2 setosa Sepal.Length 4.9 #> 3 3 setosa Sepal.Length 4.7 #> 4 4 setosa Sepal.Length 4.6 #> 5 5 setosa Sepal.Length 5.0 #> 6 6 setosa Sepal.Length 5.4中的每个值都保留其size，因此您将始终能够将其重新组合到较宽的数据集（删除不必要的rowid）：

rowid

使用重复标识符传播data.frame / tibble

2 个答案: