Question

我目前正与R.合作。我的数据框有三个名称，每列一个：year1，year2和year3。每列都有一组数字数据。

我希望得到一个结果数据框，其中包含在两个不同列中重复的数据，即：num.4和year1重复year2新数据框如果num.4在num.5和year2中重复year3新数据框中包含num.5，则newdf1 <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3, c(1)] newdf2 <- origdf[origdf$year2 == origdf$year3, c(2)]有NA。

我尝试了以下代码：

newdf <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3 & origdf$year2 == origdf$year3, c(1, 2)]

然后我合并了两个数据框，但并未包含所有数据，它包含许多>newdf 1 num.4 2 num.5个值。

然后我尝试了以下代码：

protocol ShareEventDelegate: class {
    func didShareButtonSelected()
}

但它也没有用，它给了我一个结果数据框，其中包含许多NA值和一些正确的值，但并非所有重复的数字都被包括在内。

如何有效地拥有一个数据框，其中包含在原始数据框的三个不同列中的两个中重复的值，而没有重复的值（我不希望有一个重复的数字）原始数据框的所有三列）？

预期结果将是：

class CustomCell: UITableViewCell {
    weak var shareDelegate: ShareEventDelegate?

    func yourButtonAction() { 
       shareDelegate.didShareButtonSelected?()
    }
}

Answer 1

如果我以正确的方式理解，您正在寻找数据框列之间的交叉点，但对所有三列通用的元素应排除。然后intersect()函数可能是一个解决方案。代码可能看起来像那样

n_years <- 3
# generate all possible combinations of two indices of considered years
indices_comb <- combn(x = 1:n_years, m = 2)
# apply intersect() along all possible combinations
all_intersects <- sapply(function(i) intersect(origdf[, indices_comb[1, i]], 
    origdf[, indices_comb[2, i]]), X = 1:ncol(indices_comb))

很好地，排除所有原始列（year1，year2，year3）常见的元素：

# find elements which are common for all pairwise intersections
in_all <- Reduce(intersect, all_intersects)
# combine all pairwise intersections into one vector
in_pairw <- Reduce(all_intersects, f = c)
# exclude the elements which are common for all intersections
newdf <- data.frame(res = setdiff(in_pairw, in_all))

上述解决方案可以轻松缩放任意数量的原始列（年）。但请注意，只返回唯一的组合。也就是说，num.4和year1中year2出现两次，只会返回一个num.4。

重复值的数据框

1 个答案: