如何基于唯一用户ID和特定事件类型创建for循环

时间:2018-06-06 12:31:43

标签: r conditional

我有两个数据框:usersevents

两个数据框都包含一个将事件链接到用户的字段。

如何创建for循环,其中每个用户的唯一ID与特定类型的事件匹配,然后将出现次数存储到用户的新列中({{1} },users$conversation_started等)?

简而言之,它是循环的条件。

到目前为止,我有这个,但这是错误的:

users$conversation_missed

如何做到这一点的一个例子是理想的。

这个想法是:

for(i in users$id){
  users$conversation_started <- nrow(event[event$type = "conversation-started"])
}

重要提示:

for(each user) find the matching user ID in events count the number of event types == "conversation-started" assign count value to user$conversation_started end for 字段可以包含五个值中的一个,因此我需要能够有效地过滤每个关联的每个type

type

数据框(请注意,这些是已删除机密信息的缩减版本):

> events$type %>% table %>% as.matrix
                           [,1]
conversation-accepted          3120
conversation-already-accepted 19673
conversation-declined            27
conversation-missed             831
conversation-request          23427

users <- structure(list(`_id` = c("JTuXhdI4Ai", "iGIeCEXyVE", "6XFtOJh0bD", 
"mNN986oQv9", "9NI71KBMX9", "x1jH7t0Cmy"), language = c("en", 
"en", "en", "en", "en", "en"), registering = c(TRUE, TRUE, FALSE, 
FALSE, FALSE, NA), `_created_at` = structure(c(1485995043.131, 
1488898839.838, 1480461193.146, 1481407887.979, 1489942757.189, 
1491311381.916), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `_updated_at` = structure(c(1521039527.236, 1488898864.834, 
    1527618624.877, 1481407959.116, 1490043838.561, 1491320333.09
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), lastOnlineTimestamp = c(1521039526.90314, 
    NA, 1480461472, 1481407959, 1490043838, NA), isAgent = c(FALSE, 
    NA, FALSE, FALSE, FALSE, NA), lastAvailableTime = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct", 
    "POSIXt"), tzone = ""), available = c(NA, NA, NA, NA, NA, 
    NA), busy = c(NA, NA, NA, NA, NA, NA), joinedTeam = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct", 
    "POSIXt"), tzone = ""), timezone = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    )), row.names = c("list.1", "list.2", "list.3", "list.4", 
"list.5", "list.6"), class = "data.frame")

更新:2018年9月21日

此解决方案现在导致在函数末尾生成events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy", "yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"), expirationTime = structure(c(1527261147.873, NA, 1527262121.332, NA, 1527263411.619, 1527263411.619), class = c("POSIXct", "POSIXt"), tzone = ""), partId = c("d22bfddc-cd51-489f-aec8-5ab9225c0dd5", "d22bfddc-cd51-489f-aec8-5ab9225c0dd5", "cf4356da-b63e-4e4d-8e7b-fb63035801d8", "cf4356da-b63e-4e4d-8e7b-fb63035801d8", "a720185e-c300-47c0-b30d-64e1f272d482", "a720185e-c300-47c0-b30d-64e1f272d482"), type = c("conversation-request", "conversation-accepted", "conversation-request", "conversation-accepted", "conversation-request", "conversation-request"), `_p_conversation` = c("Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", "Conversation$bDuAYSZgen", "Conversation$bDuAYSZgen"), `_p_merchant` = c("Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x"), `_p_associate` = c("D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC" ), `_wperm` = list(list(), list(), list(), list(), list(), list()), `_rperm` = list("*", "*", "*", "*", "*", "*"), `_created_at` = structure(c(1527264657.998, 1527264662.043, 1527265661.846, 1527265669.435, 1527266922.056, 1527266922.059), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `_updated_at` = structure(c(1527264657.998, 1527264662.043, 1527265661.846, 1527265669.435, 1527266922.056, 1527266922.059 ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), read = c(TRUE, NA, TRUE, NA, NA, NA), data.customerName = c("Shopper 109339", NA, "Shopper 109339", NA, "Shopper 109364", "Shopper 109364" ), data.departmentName = c("Personal advisors", NA, "Personal advisors", NA, "Personal advisors", "Personal advisors"), data.recurring = c(FALSE, NA, TRUE, NA, FALSE, FALSE), data.new = c(TRUE, NA, FALSE, NA, TRUE, TRUE), data.missed = c(0L, NA, 0L, NA, 0L, 0L), data.customerId = c("84uOFRLmLd", "84uOFRLmLd", "84uOFRLmLd", "84uOFRLmLd", "5Dw4iax3Tj", "5Dw4iax3Tj"), data.claimingTime = c(NA, 4L, NA, 7L, NA, NA), data.lead = c(NA, NA, FALSE, NA, NA, NA), data.maxMissed = c(NA, NA, NA, NA, NA, NA), data.associateName = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_), data.maxDecline = c(NA, NA, NA, NA, NA, NA ), data.goUnavailable = c(NA, NA, NA, NA, NA, NA)), row.names = c("list.1", "list.2", "list.3", "list.4", "list.5", "list.6"), class = "data.frame") - 仅数据帧。当写入.csv时,这就是我得到的(当然,Excel显示NA - 值为空值):

enter image description here

我的数据源没有改变,我的脚本也没有改变。

可能导致这种情况的原因是什么?

我的猜测是,这是一个无法预料的情况,每个步骤可能已经NA次点击;因此,有没有办法将0添加到没有任何匹配而不是0 /空白值的情况下?

有没有办法避免这种情况?

1 个答案:

答案 0 :(得分:0)

基于提供的数据的新解决方案。

注意:由于您的数据在_id中没有重叠,我将events$_id更改为与users中的相同。

简化示例数据:

users <- structure(list(`_id` = structure(c(4L, 3L, 1L, 5L, 2L, 6L), 
                                                     .Label = c("6XFtOJh0bD", "9NI71KBMX9", "iGIeCEXyVE", 
                                                                           "JTuXhdI4Ai", "mNN986oQv9", "x1jH7t0Cmy"), 
                                                     class = "factor")), .Names = "_id", 
                   row.names = c(NA, -6L), class = "data.frame")
events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy", 
                                   "yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"), 
                         type = c("conversation-request", "conversation-accepted", 
                                  "conversation-request", "conversation-accepted", 
                                  "conversation-request", "conversation-request")), 
                    .Names = c("_id", "type"), class = "data.frame", 
                    row.names = c("list.1", "list.2", "list.3", "list.4", "list.5", "list.6"))
events$`_id` <- users$`_id`

> users
         _id
1 JTuXhdI4Ai
2 iGIeCEXyVE
3 6XFtOJh0bD
4 mNN986oQv9
5 9NI71KBMX9
6 x1jH7t0Cmy

> events
              _id                  type
list.1 JTuXhdI4Ai  conversation-request
list.2 iGIeCEXyVE conversation-accepted
list.3 6XFtOJh0bD  conversation-request
list.4 mNN986oQv9 conversation-accepted
list.5 9NI71KBMX9  conversation-request
list.6 x1jH7t0Cmy  conversation-request

我们可以使用我之前建议的相同方法,只需稍微增强一点。

首先,我们遍历unique(events$type)以在列表中存储每个ID的每种类型事件table()

test <- lapply(unique(events$type), function(x) table(events$`_id`, events$type == x))

然后我们将特定类型存储为列表中相应表的名称:

names(test) <- unique(events$type)

现在我们使用一个简单的for - 循环到match() user$_idrownames表,并将信息存储在一个名为事件类型:

for(i in names(test)){
  users[, i] <- test[[i]][, 2][match(users$`_id`, rownames(test[[i]]))]
}

结果:

> users
         _id conversation-request conversation-accepted
1 JTuXhdI4Ai                    1                     0
2 iGIeCEXyVE                    0                     1
3 6XFtOJh0bD                    1                     0
4 mNN986oQv9                    0                     1
5 9NI71KBMX9                    1                     0
6 x1jH7t0Cmy                    1                     0

希望这有帮助!