根据以前的值向下填充向量,并根据以前的值更改未来值

时间:2016-10-28 14:42:46

标签: r

我有一个酒店住宿的矢量。每行代表一个新的一天。向量填充以下选项: 1.首先开始' - 代表开始个人第一次入住酒店 2.' NA' - 表示个人在酒店的时间(不能开始或结束) 3.'结束' - 表示个人逗留的结束(可以是他们任何一次住宿的结束。是的,个人可以停留多次。 4.'另一个开始' - 代表第一次入住后的住宿开始。可以是第二或第三或第四等(有些人会访问同一家酒店10次以上) 5.'首先开始结束' - 代表第一次入住仅一天的人 6.'另一个开始结束' - 代表一个人只逗留一天不是他们的第一次

我还有一个人物ID变量。

Here是我拥有的和我想要的样本

      Person_ID Have                Want
 [1,] "1"       "first start"       "1" 
 [2,] "1"       "NA"                "1" 
 [3,] "1"       "NA"                "1" 
 [4,] "1"       "end"               "1" 
 [5,] "1"       "another start"     "2" 
 [6,] "1"       "NA"                "2" 
 [7,] "1"       "NA"                "2" 
 [8,] "1"       "NA"                "2" 
 [9,] "1"       "end"               "2" 
[10,] "1"       "another start"     "3" 
[11,] "1"       "NA"                "3" 
[12,] "1"       "end"               "3" 
[13,] "1"       "another start"     "4" 
[14,] "1"       "NA"                "4" 
[15,] "1"       "end"               "4" 
[16,] "1"       "another start end" "5" 
[17,] "1"       "another start"     "6" 
[18,] "1"       "NA"                "6" 
[19,] "1"       "end"               "6" 
[20,] "1"       "another start end" "7" 
[21,] "1"       "another start end" "8" 
[22,] "2"       "first start"       "1" 
[23,] "2"       "NA"                "1" 
[24,] "2"       "end"               "1" 
[25,] "3"       "first start end"   "1" 
[26,] "3"       "another start"     "2" 
[27,] "3"       "NA"                "2" 
[28,] "3"       "end"               "2" 
[29,] "4"       "first start end"   "1" 
[30,] "4"       "another start end" "2" 
[31,] "4"       "another start"     "3" 
[32,] "4"       "NA"                "3" 
[33,] "4"       "end"               "3"

我尝试使用循环,但我的文件大约500,000行,并且自动运行花费的时间太长,任何有效方法的建议都将非常感激!谢谢!

1 个答案:

答案 0 :(得分:0)

您可以使用tidyverse包。假设您有一个名为df的矩阵,其中包含您的数据:

library(tidyverse)

result <- df %>% 
  as_tibble() %>% 
  mutate_at("Have", funs(if_else(. %in% c("end", "NA"), NA_character_, .))) %>% 
  fill(Have) %>% 
  group_by(Person_ID) %>% 
  mutate(Want = as.factor(Have) %>% forcats::fct_inorder() %>% as.numeric())