Question

我在R中有一个数据框如下：

D = data.frame(countrycode = c(2, 2, 2, 3, 3, 3), 
           year = c(1980, 1991, 2013, 1980, 1991, 2013), 
           hello = c("A", "B", "C", "D", "E", "F"), 
           world = c("Z", "Y", "X", "NA", "Q", "NA"), 
           foo = c("Yes", "No", "NA", "NA", "Yes", "NA"))

我希望将hello，world和foo列合并到一个列中，并按countrycode和year编制索引，如下所示：

output<-data.frame(countrycode=c(2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
    year=c(1980,1980,1980,1991,1991,1991,2013,2013,2013,1980,1980,1980,1991,1991,1991,2013,2013,2013),
    Combined=c("A","Z","Yes","B","Y","No","C","X","NA","D","NA","NA","E","Q","Yes","F","NA","NA"))

我已尝试使用标准R中的cbind和来自gather包的tidyr，但似乎都不起作用。

Answer 1

我认为你正在寻找包reshape2。请尝试以下代码：

library(reshape2)

output<-melt(D,id.vars=c("countrycode","year"))
output<-output[order(output$countrycode,output$year),]

它再现了你的例子。两个函数非常有用：融合和相反：dcast。

Answer 2

reshape2和dplyr单行：

library(reshape2)
library(dplyr)
converted = melt(D,
  measure.vars=c("hello","world","foo"),
  value.name="Combined") %>%
    arrange(countrycode, year) %>% select(-variable)

> converted
   countrycode year Combined
1            2 1980        A
2            2 1980        Z
3            2 1980      Yes
4            2 1991        B
5            2 1991        Y
6            2 1991       No

等。这也会以与样本输出相同的列和列名称结束。

Answer 3

使用tidyr和dplyr，这看起来像

library(dplyr)
library(tidyr)

D %>% gather(var, Combined, hello:foo) %>% arrange(countrycode, year)
#    countrycode year   var Combined
# 1            2 1980 hello        A
# 2            2 1980 world        Z
# 3            2 1980   foo      Yes
# 4            2 1991 hello        B
# 5            2 1991 world        Y
# 6            2 1991   foo       No
# .            .  ...   ...      ...

我离开了关键列，因为没有它会丢失数据，但是如果你真的不想要它，请点击%>% select(-var)。

如何组合R

3 个答案: