根据状态数据添加缺勤行

时间:2017-10-13 01:13:35

标签: r

我在R中为不同日期的4个不同站点的个人提供在线数据。我想为每天在每个站点上看不到的每个ID创建行,并添加一个存在/不存在列。

这是我的状态数据的一个子集:

ID    SP SITE  DATE SMP
975    1    1 41579 FJB
997    1    1 41579 FAW
114    1    1 41579 FJW
926    2    1 41609 FJB
101    2    1 41609 FJB
108    2    1 41609 FAB
129    3    2 41710 FUB
131    3    2 41710 MAW
132    3    2 41710 FAW

这就是我想要创建的内容:

ID    SP SITE  DATE SMP Present?
975    1    1 41579 FJB   Yes
997    1    1 41579 FAW   Yes
114    1    1 41579 FJW   Yes
926    1    1 41579 FJB   No
101    1    1 41579 FJB   No
108    1    1 41579 FAB   No
129    1    1 41579 FUB   No
131    1    1 41579 MAW   No
132    1    1 41579 FAW   No
975    2    1 41609 FJB   No
997    2    1 41609 FAW   No
114    2    1 41609 FJW   No
926    2    1 41609 FJB   Yes
101    2    1 41609 FJB   Yes
108    2    1 41609 FAB   Yes
129    2    1 41609 FUB   No
131    2    1 41609 MAW   No
132    2    1 41609 FAW   No
975    3    2 41710 FJB   No
997    3    2 41710 FAW   No
114    3    2 41710 FJW   No
926    3    2 41710 FJB   No
101    3    2 41710 FJB   No
108    3    2 41710 FAB   No
129    3    2 41710 FUB   Yes
131    3    2 41710 MAW   Yes
132    3    2 41710 FAW   Yes

我希望有人可以提供帮助!

1 个答案:

答案 0 :(得分:0)

首先,您需要准备一个包含ID,SP,SITE,DATE和SMP的所有可能组合的数据集,并将其称为attendance_list

library(tidyverse)
txt <- "ID    SP SITE  DATE SMP
975    1    1 41579 FJB
997    1    1 41579 FAW
114    1    1 41579 FJW
926    2    1 41609 FJB
101    2    1 41609 FJB
108    2    1 41609 FAB
129    3    2 41710 FUB
131    3    2 41710 MAW
132    3    2 41710 FAW"
presence_data <- txt %>% 
  gsub(" +", " ", x=.) %>% 
  read.delim(text=., sep=" ")

# Attendance list
## It looks like each ID has only one SMP
id_smp <- presence_data %>% 
  select(ID, SMP) %>% 
  distinct()
## Working site and date
sp_site_date <- presence_data %>% 
  select(SP, SITE, DATE) %>% 
  distinct()
attendance_list <- merge(id_smp, sp_site_date)

然后,您可以将presence_data加入attendance_list的“是”列,并替换“NA”(attendance_list中的数据,但不会presence data中的数据)通过'不'。

attendance_list <- attendance_list %>% 
  left_join(presence_data %>% mutate(PRESENT="Yes"), by=c("ID", "SMP", "SP", "SITE", "DATE")) %>% 
  mutate(PRESENT = ifelse(is.na(PRESENT), "No", PRESENT))

<强>输出:

> attendance_list
#     ID SMP SP SITE  DATE PRESENT
# 1  975 FJB  1    1 41579     Yes
# 2  997 FAW  1    1 41579     Yes
# 3  114 FJW  1    1 41579     Yes
# 4  926 FJB  1    1 41579      No
# 5  101 FJB  1    1 41579      No
# 6  108 FAB  1    1 41579      No
# 7  129 FUB  1    1 41579      No
# 8  131 MAW  1    1 41579      No
# 9  132 FAW  1    1 41579      No
# 10 975 FJB  2    1 41609      No
# 11 997 FAW  2    1 41609      No
# 12 114 FJW  2    1 41609      No
# 13 926 FJB  2    1 41609     Yes
# 14 101 FJB  2    1 41609     Yes
# 15 108 FAB  2    1 41609     Yes
# 16 129 FUB  2    1 41609      No
# 17 131 MAW  2    1 41609      No
# 18 132 FAW  2    1 41609      No
# 19 975 FJB  3    2 41710      No
# 20 997 FAW  3    2 41710      No
# 21 114 FJW  3    2 41710      No
# 22 926 FJB  3    2 41710      No
# 23 101 FJB  3    2 41710      No
# 24 108 FAB  3    2 41710      No
# 25 129 FUB  3    2 41710     Yes
# 26 131 MAW  3    2 41710     Yes
# 27 132 FAW  3    2 41710     Yes