Question

我有以下二元数据集：

ID.x     Attribute1.x     Attribute2.x    ID.y     Attribute1.y   Attribute2.y   rowsum
2323        11                11           9923        22            11        1
3423        11                22           3422        11            44      1
5343        22                22           5555        11            0        0
54336       0                 44           0234        11            44         1
4334        11                22           2345        44            11           1
34563       22                0            9429        0             22           2
34534       44                0            2345        44            11        1

我想检查actor x的每列中的属性是否与y

相同

Attribute1.x == Attribute1.y
Attribute2.x == Attribute2.y
...

并将它们汇总成一列＆＃34; rowsum＆＃34;。我的完整数据框由每个actor（x，y）的100个Attributes列组成。

我已经尝试过并以某种方式失败了：

dyadic_df$rowsome <- apply(dat_wp_dyadic_1, 1, function(x) length(which(x==11 & x==22 & x==0 & x==44)))

Answer 1

在 apply 中获取相同长度的列的索引：

# get index
x_index <- grep("^A.*x$", colnames(df1))
y_index <- grep("^A.*y$", colnames(df1))

# loop by row, sort and compare
df1$myRowSum <- 
  apply(df1, 1, function(i){
    length(intersect(i[x_index], i[y_index]))
  })

df1
#    ID.x Attribute1.x Attribute2.x ID.y Attribute1.y Attribute2.y rowsum myRowSum
# 1  2323           11           11 9923           22           11      1        1
# 2  3423           11           22 3422           11           44      1        1
# 3  5343           22           22 5555           11            0      0        0
# 4 54336            0           44  234           11           44      1        1
# 5  4334           11           22 2345           44           11      1        1
# 6 34563           22            0 9429            0           22      2        2
# 7 34534           44            0 2345           44           11      1        1

修改

OP ：根据您的建议，我使用sum（（i [x_index] == i [y_index]））而不是相交来总结每列的数量完全相同。现在我想总结条件是否满足sum（i [x_index]＆amp; i [x_index] == 11 | 22）

mySet <- c(11, 22) # loop by row, sort and compare df1$myRowSumFilter <- apply(df1, 1, function(i){ length(intersect(i[x_index][ i[x_index] %in% mySet ], i[y_index][ i[y_index] %in% mySet ])) }) df1

在二元数据结构中的条件下求和相同的值

1 个答案: