Question

我正在使用通常在调查中发现的household grid。 household grid表示家庭成员之间的关系。

我试图在这里重现一个

  houseID id    sex age relto1   relto2         relto3         relto4
1       1  1   male  45      0   spouse not applicable not applicable
2       1  2 female  38 spouse        0 not applicable not applicable
3       2  1 female  18      0 daughter       daughter not applicable
4       2  2   male  50 parent        0         spouse not applicable
5       2  3 female  45 parent   spouse              0 not applicable
6       3  1 female  45      0   parent         parent         spouse
7       3  2   male  17    son        0        brother            son
8       3  3   male  19    son  brother              0            son
9       3  4   male  50 spouse   parent         parent              0

houseID是唯一的家庭标识符，id是家庭成员的唯一ID，relto_表示每个家庭与id的关系。

例如，第1行，relto2 == spouse表示id == 1是第一个家庭spouse的{{1}}。

我有兴趣检索配偶标识符。诀窍是id == 2并不总是处于同一位置。

在4号家庭中，配偶在spouse id 1和4。

我感兴趣的是这样做

id

我能提出的最好的代码就是这个

  houseID id    sex age spousenum
1       1  1   male  45         2
2       1  2 female  38         1
3       2  1 female  18         0
4       2  2   male  50         3
5       2  3 female  45         2
6       3  1 female  45         4
7       3  2   male  17         0
8       3  3   male  19         0
9       3  4   male  50         1

然而，它似乎有点狡猾和缓慢。

是否有更高效代码的想法？

数据

dtsp = df[, grepl('rel', colnames(df))  ] 

# not too long, its fine # 
for(i in 1:nrow(dtsp)){
  for(j in 1:ncol(dtsp)){
    if(dtsp[i, j] == 'spouse'){
      df[i,'spousenum'] <- j
    }
  }
}

Answer 1

我们可以使用max.col轻松完成此操作。子集＆＃39; relt＆＃39;数据集列（使用grep），使用==创建逻辑矩阵，使用max.col查找每行的first索引值为TRUE，乘以{{1因此，如果没有TRUE值，它将变为0，并且rowSums具有非relt列的数据集。

cbind

如果我们对i1 <- grep("relt", colnames(df1)) m1 <- df1[i1] == "spouse" cbind(df1[-i1], spousenum = max.col(m1, "first")*rowSums(m1)) # houseID id sex age spousenum #1 1 1 male 45 2 #2 1 2 female 38 1 #3 2 1 female 18 0 #4 2 2 male 50 3 #5 2 3 female 45 2 #6 3 1 female 45 4 #7 3 2 male 17 0 #8 3 3 male 19 0 #9 3 4 male 50 1家庭解决方案感兴趣，请使用dplyr/tidyr创建一个rowname列，重新设置为＆＃39; long＆＃39;格式为tibble::rownames_to_column，gather只有＆＃39;配偶＆＃39;行，filter转换为＆＃39;＆＃39;列到transmute，从＆＃39; relt＆＃39;中提取数字部分要创建＆＃39; spousenum＆＃39;，请将numeric与complete（来自full_seq）一起使用，以创建＆＃39;＆＃39;完整序列。在填充'spousenum＆＃39;使用0，并使用原始数据集cbind。

tidyr

Answer 2

试试这个：

df$spousenum = apply(df[,5:8], 1, function(r) which(r=='spouse')[1])

r - 在家庭网格中检索夫妻ID

2 个答案: