Question

我有两个数据集。一个看起来像这样：

    Male      Female     Territory    
 1  1         11          TEE          
 2  2         12          JEB          
 3  3         13          GAT  
 4  4         14          SHY
 5  5         15          BOB  
 6  6         16          LEE
 7  7         17          BOO
 8  8         18          DON
 9  9         19          RAZ
10  10        20          ZAP

这个数据集告诉我们雄性和雌性的ID号（这些是观察到的交配对 - 例如，观察到雄性1和雌性11交配并且它们占据的区域称为TEE）和领土他们居住的名字。

其他数据集如下所示：

   $GAT
   [1] "TEE" "SHY" "BOB"

   $JEB
   [1] "LEE" "GAT" "BOO"

   $TEE
   [1] "DON" "RAZ" "ZAP"

第二个数据集列出了每个地区的周边地区。例如，领土TEE，SHY和BOB围绕领土GAT。

这两个数据集都是字符形式。

我要做的是根据他们居住的地区周围的地区以及居住在周边地区的男性，列出每个女性个体的潜在配偶。所以我的最终目标是得到这样的东西：

    $11
    [1] "8" "9" "10"

    $12
    [1] "6" "3" "7"

    $13
    [1] "1" "4" "5"

    etc...

因此，我必须尝试将每位女性所在的地区与周围的地区列表进行匹配，以获得每位女性的周边地区列表。然后我必须找到居住在周围地区的所有雄性（以及居住在雌性所在地区的雄性）。

老实说，我甚至不确定如何开始这个。即使是可以帮助我开始这项工作的东西也会非常感激。

谢谢！

Answer 1

我稍微修改了你的例子以包含区域副本。

df <- data.frame(Male=1:4, Female=5:8, Territory=c("TEE","TEE","JEB","GAT"), Year=2013, stringsAsFactors = FALSE)
#  Male Female Territory Year
#1    1      5       TEE 2013
#2    2      6       TEE 2013
#3    3      7       JEB 2013
#4    4      8       GAT 2013

neighbour <- list()
neighbour[['GAT']] <- c("TEE","SHY","BOB")
neighbour[['JEB']] <- c("LEE", "GAT", "BOO")
neighbour[['TEE']] <- c("DON", "RAZ", "ZAP")
#$GAT
#[1] "TEE" "SHY" "BOB"
#$JEB
#[1] "LEE" "GAT" "BOO"
#$TEE
#[1] "DON" "RAZ" "ZAP"

以下是使用lapply和%in%的可能解决方案。

#iterate over all females
result <- lapply(setNames(nm=df$Female), function(x) {
    #territory of the current female
    FemTer <- df[df$Female == x, "Territory"]
    #males living in the neighbourhood
    df[df$Territory %in% c(FemTer, neighbour[[FemTer]]), "Male"]
})
result
#$`5`
#[1] 1 2
#
#$`6`
#[1] 1 2
#
#$`7`
#[1] 3 4
#
#$`8`
#[1] 1 2 4

我只是假设，你将包括女性所在的领土以及周围环境。如果您只想要周围环境，只需从FemTer,删除df[df$Territory %in% c(FemTer, neighbour[[FemTer]]), "Male"]。

Answer 2

考虑使用merge，reshape，by进行此基础R数据争论：

数据

txt = ' Male Female Territory 1 1 11 TEE 2 2 12 JEB 3 3 13 GAT 4 4 14 SHY 5 5 15 BOB 6 6 16 LEE 7 7 17 BOO 8 8 18 DON 9 9 19 RAZ 10 10 20 ZAP' df <- read.table(text=txt, header=TRUE) territories <- list(GAT=c("TEE","SHY","BOB"), JEB=c("LEE","GAT","BOO"), TEE=c("DON","RAZ","ZAP"))

<强>过程

# CASE LIST TO DF df_territories <- data.frame(territories, stringsAsFactors = FALSE) # MELT DF TO LONG FORMAT df_territories <- reshape(df_territories, varying = list(1:3), v.names="nearest", timevar="Territory", times=names(df_territories)[1:ncol(df_territories)], direction="long") # NESTED MERGE mdf <- merge(merge(df, df_territories, by="Territory", all.x=TRUE), df, by.x="nearest", by.y="Territory") # BY GROUP SLICE matelist <- by(mdf, mdf$Female.x, FUN=function(grp){ as.character(sort(grp$Male.y)) }) # LIST CLEANUP attributes(matelist) <- NULL names(matelist) <- unique(sort(mdf$Female.x)) matelist # $`11` # [1] "8" "9" "10" # $`12` # [1] "3" "6" "7" # $`13` # [1] "1" "4" "5"

匹配不同的数据集以在R中创建新列表

2 个答案: