Question

我试图转换我的数据，以便每个关键字都有一个值而不是按值分组。我的数据目前的组织方式如下：

df：

score1 score2 keyword1 keyword2 keyword3
.2    .4    brown    fox    jump
.7    .2    hello    bye    
.1    .9    foo

我希望我的数据看起来像这样：

keyword score1 score2
brown    .2    .4     
fox    .2    .4    
jump    .2    .4    
hello    .7    .2    
bye    .7    .2   
foo    .1    .9

数据：

df = structure(list(score1 = c(.2, .7, .1), score2 = c(.4, .2, .9), keyword1 = c("brown", "hello", "foo"), keyword2 = c("fox", "bye"), keyword3 = "jump"), .Names = c("score1", "score2", "keyword1", "keyword2", "keyword3"), row.names = c(NA, -5L), class = "data.frame")

有什么建议吗？

Answer 1

以下是使用melt包中的data.table的一种方法：

# drop some missing obs
df <- df[1:3, ]
# create ID variable
df$id <- 1:nrow(df)

# load data.table
library(data.table)
# reshape long (melt)
newdf <- melt(df, id.vars=c("id", "score1", "score2"), 
     measure.vars=c("keyword1", "keyword2", "keyword3"), value.name="keyword")

虽然在示例中没有必要，但我添加了一个id变量，以便此方法适用于较大的数据集，以涵盖score1和score2对于多个观察可能相同的情况。这会创建比示例中的列多两列，id列和带有＆＃34; keyword1＆＃34;的列。等等。放下这些很容易。另外，存在一些NA的行，对输入数据的非矩形形状做。可以使用is.na：

删除这些内容

# drop rows with missing values in keyword column
newdf <- newdf[!is.na(newdf$keyword),]

Answer 2

使用gather()中的tidyr的另一种替代方法：

library(tidyr)

df %>%
  gather(label, keyword, -(score1:score2), na.rm = TRUE)

给出了：

#   score1 score2    label keyword
#1     0.2    0.4 keyword1   brown
#2     0.7    0.2 keyword1   hello
#3     0.1    0.9 keyword1     foo
#6     0.2    0.4 keyword2     fox
#7     0.7    0.2 keyword2     bye
#11    0.2    0.4 keyword3    jump

或者，您可以通过向链中添加label来删除select(-label)列。

数据

df <- structure(list(score1 = c(0.2, 0.7, 0.1, NA, NA), score2 = c(0.4, 0.2, 0.9, NA, NA), keyword1 = c("brown", "hello", "foo", NA, NA), keyword2 = c("fox", "bye", NA, NA, NA), keyword3 = c("jump", NA, NA, NA, NA)), .Names = c("score1", "score2", "keyword1", "keyword2", "keyword3"), row.names = c(NA, -5L), class = "data.frame")

R转换具有多个值的数据

2 个答案: