我有一个酒店住宿的矢量。每行代表一个新的一天。向量填充以下选项: 1.首先开始' - 代表开始个人第一次入住酒店 2.' NA' - 表示个人在酒店的时间(不能开始或结束) 3.'结束' - 表示个人逗留的结束(可以是他们任何一次住宿的结束。是的,个人可以停留多次。 4.'另一个开始' - 代表第一次入住后的住宿开始。可以是第二或第三或第四等(有些人会访问同一家酒店10次以上) 5.'首先开始结束' - 代表第一次入住仅一天的人 6.'另一个开始结束' - 代表一个人只逗留一天不是他们的第一次
我还有一个人物ID变量。
Here是我拥有的和我想要的样本
Person_ID Have Want
[1,] "1" "first start" "1"
[2,] "1" "NA" "1"
[3,] "1" "NA" "1"
[4,] "1" "end" "1"
[5,] "1" "another start" "2"
[6,] "1" "NA" "2"
[7,] "1" "NA" "2"
[8,] "1" "NA" "2"
[9,] "1" "end" "2"
[10,] "1" "another start" "3"
[11,] "1" "NA" "3"
[12,] "1" "end" "3"
[13,] "1" "another start" "4"
[14,] "1" "NA" "4"
[15,] "1" "end" "4"
[16,] "1" "another start end" "5"
[17,] "1" "another start" "6"
[18,] "1" "NA" "6"
[19,] "1" "end" "6"
[20,] "1" "another start end" "7"
[21,] "1" "another start end" "8"
[22,] "2" "first start" "1"
[23,] "2" "NA" "1"
[24,] "2" "end" "1"
[25,] "3" "first start end" "1"
[26,] "3" "another start" "2"
[27,] "3" "NA" "2"
[28,] "3" "end" "2"
[29,] "4" "first start end" "1"
[30,] "4" "another start end" "2"
[31,] "4" "another start" "3"
[32,] "4" "NA" "3"
[33,] "4" "end" "3"
我尝试使用循环,但我的文件大约500,000行,并且自动运行花费的时间太长,任何有效方法的建议都将非常感激!谢谢!
答案 0 :(得分:0)
您可以使用tidyverse
包。假设您有一个名为df的矩阵,其中包含您的数据:
library(tidyverse)
result <- df %>%
as_tibble() %>%
mutate_at("Have", funs(if_else(. %in% c("end", "NA"), NA_character_, .))) %>%
fill(Have) %>%
group_by(Person_ID) %>%
mutate(Want = as.factor(Have) %>% forcats::fct_inorder() %>% as.numeric())