基于单个字段扩展data.frame行

时间:2014-06-30 13:48:52

标签: r reshape

我有一个简单的数据集:

From,To,Date,Subject

我想将这些数据重新整形为:

e1,e2;e3;e4,d1,s1

扩大范围:

e1,e2,d1,s1
e1,e3,d1,s1
e1,e4,d1,s1

现在,我通过for循环覆盖我的数据框并在运行中构建一个新的,但我想知道是否还有更多的“R” - 这样做了吗?

修改 这是我现在拥有的,它有效,但有点丑陋(并显示我仍然有点有限的R技能):

filteredEmailsExpanded <- NULL
toCol <- 2
for (row in 1:nrow(filteredEmails)) {
  receivers <- sapply(strsplit(filteredEmails[row, toCol], ","), function(x) gsub(" ", "", ))
  for (receiver in receivers) {
    newRow <- rep(filteredEmails[row,], times = 1)
    newRow$To <- receiver
    rbind(filteredEmailsExpanded, newRow)
  }
}

2 个答案:

答案 0 :(得分:1)

你首先扩展你的数据框(称之为d),重复第i行n(i)次,其中n(i)是&#39;的出现次数;&#39 ;在d$To[i]中,然后您通过这些出现来替换d$To?我已经为您的示例数据添加了一行以更好地说明这一点

d <- data.frame(
        From = c("e1", "e5"), 
        To = c("e2;e3;e4", "e6;e7"),
        Date = c("d1", "d2"),
        Subject = c("s1", "s2"),
        stringsAsFactors = FALSE)

v <- strsplit(d$To, ";")
lengths <- sapply(v, length)
d <- d[rep(1:nrow(d), lengths), ]
d$To <- unlist(v)

答案 1 :(得分:1)

您可能希望查看我的“splitstackshape”包,特别是具有“long”参数的函数concat.split.multiple

使用@ konvas的示例数据,尝试:

library(splitstackshape)
concat.split.multiple(d, "To", ";", "long")
#   From Date Subject time   To
# 1   e1   d1      s1    1   e2
# 2   e5   d2      s2    1   e6
# 3   e1   d1      s1    2   e3
# 4   e5   d2      s2    2   e7
# 5   e1   d1      s1    3   e4
# 6   e5   d2      s2    3 <NA>

或者,查看其后续功能(尚未进入包中)。继任者现在被称为cSplit and is available as a Gist。它很多更快,但同样易于使用:

## cSplit(indt = d, splitCols = "To", sep = ";", direction = "long")
cSplit(d, "To", ";", "long")
#    From To Date Subject
# 1:   e1 e2   d1      s1
# 2:   e1 e3   d1      s1
# 3:   e1 e4   d1      s1
# 4:   e5 e6   d2      s2
# 5:   e5 e7   d2      s2