使用循环对数据帧中的列进行迭代操作

时间:2019-04-16 14:38:10

标签: r

我有一个数据列,其列名包括格式为“ W1_2019”的星期和年份指示符以及其他文本。完整的数据框包含52个星期,每个星期有5列。我的目标是采用以下代码,该代码完全按照我希望在第1周和第2周的时间进行操作,并将其放入x = 1到52的循环中,因此我不必使用相同的一半的52倍十二行。

eidsr <- dget(file="test1.txt")

mode_xmt <- data.frame(District=eidsr$district) #Initializes dataframe mode_xmt with only 1 column containing District names

wtmp <- select(eidsr, contains("W1_2019"))
wtmp$mode <- "NoRep"
wtmp$mode[wtmp$W1_2019_EIDSR_Total_Malaria_cases>0] <- "Report"
wtmp$mode[wtmp$`W1_2019_EIDSR-Mobile_SMS`==1] <- "Mobile_SMS"
wtmp$mode[wtmp$`W1_2019_EIDSR-Mobile_Internet`==1] <- "Mobile_Internet"

#At this point the dataframe wtmp looks like the example below.

mode_xmt$`2019_W1` <- wtmp$mode #Appends ONLY the W1_2019 column to mode_xmt
rm(wtmp)

wtmp <- select(eidsr, contains("W2_2019"))
wtmp$mode <- "NoRep"
wtmp$mode[wtmp$W2_2019_EIDSR_Total_Malaria_cases>0] <- "Report"
wtmp$mode[wtmp$`W2_2019_EIDSR-Mobile_SMS`==1] <- "Mobile_SMS"
wtmp$mode[wtmp$`W2_2019_EIDSR-Mobile_Internet`==1] <- "Mobile_Internet"

mode_xmt$`2019_W2` <- wtmp$mode
rm(wtmp)

在每次操作结束时,我的工作数据如下。数据框wtmp看起来像这样:

   `W1_2019_EIDSR-Timely_~ W1_2019_EIDSR_Total_Mala~ W1_2019_EIDSR_Date_R~ `W1_2019_EIDSR-Mobile_~ `W1_2019_EIDSR-Mobi~ mode 
                     <dbl>                     <dbl> <chr>                                   <dbl>                <dbl> <chr>
 1                      NA                         0 NA                                         NA                   NA NoRep
 2                      NA                        NA NA                                         NA                   NA NoRep
 3                      NA                        51 NA                                         NA                   NA Repo~
 4                      NA                        NA NA                                         NA                   NA NoRep
 5                      NA                        64 NA                                         NA                   NA Repo~
 6                      NA                        86 NA                                         NA                   NA Repo~
 7                      NA                        92 NA                                         NA                   NA Repo~
 8                      NA                        47 NA                                         NA                   NA Repo~
 9                      NA                        46 NA                                         NA                   NA Repo~
10                      NA                        35 NA                                         NA                   NA Repo~

mode_xmt,附加新列,如下所示:

   District 2019_W01
1        Bo    NoRep
2        Bo    NoRep
3        Bo   Report
4        Bo    NoRep
5        Bo   Report
6        Bo   Report
7        Bo   Report
8        Bo   Report
9        Bo   Report
10       Bo   Report

完成W2的第二次迭代后,mode_xmt如下所示:

   District 2019_W01 2019_W02
1        Bo    NoRep   Report
2        Bo    NoRep    NoRep
3        Bo   Report   Report
4        Bo    NoRep    NoRep
5        Bo   Report   Report
6        Bo   Report   Report
7        Bo   Report   Report
8        Bo   Report   Report
9        Bo   Report   Report
10       Bo   Report   Report

起泡,冲洗,重复。时报52.正如@DS_UNI所观察到的那样,尽管每周和每年使用单独的列会很不错,但它们将无法达到最终目的,因为它是一个长达一年以上的时间序列...但是要阻止我自己继续前进疯了,如果我可以迭代一年的52周,我会很高兴。

正如我所说,以上代码有效。我只是在寻找一种循环播放的方式,而不是在重复恶心的情况下重复播放。

以下是被截断的数据的dput文本(在工作目录中另存为test1.txt):

structure(list(`W1_2019_EIDSR-Timely_Report` = c(NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_), W1_2019_EIDSR_Total_Malaria_cases = c(0,  NA, 51, NA, 64, 86, 92, 47, 46, 35, 33, NA, NA, 77, 35, 7, 24,  27, 14, 72), W1_2019_EIDSR_Date_Received = c(NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_), `W1_2019_EIDSR-Mobile_Internet` = c(NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W1_2019_EIDSR-Mobile_SMS` = c(NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W2_2019_EIDSR-Timely_Report`
= c(NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), W2_2019_EIDSR_Total_Malaria_cases = c(55,  NA, 44, NA, 38, 26, 29, 40, 59, 18, 48, NA, NA, 37, 34, 51, 34,  38, 13, 56), W2_2019_EIDSR_Date_Received = c(NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,  NA_character_, NA_character_, NA_character_), `W2_2019_EIDSR-Mobile_Internet` = c(NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `W2_2019_EIDSR-Mobile_SMS` = c(NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,  NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), district = c("Bo",  "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo",  "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo", "Bo")), .Names = c("W1_2019_EIDSR-Timely_Report",  "W1_2019_EIDSR_Total_Malaria_cases", "W1_2019_EIDSR_Date_Received",  "W1_2019_EIDSR-Mobile_Internet", "W1_2019_EIDSR-Mobile_SMS",  "W2_2019_EIDSR-Timely_Report", "W2_2019_EIDSR_Total_Malaria_cases",  "W2_2019_EIDSR_Date_Received", "W2_2019_EIDSR-Mobile_Internet",  "W2_2019_EIDSR-Mobile_SMS", "district"), row.names = c(NA, -20L ), class = c("tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:1)

您的数据应如下所示(我也希望有一个列用于星期,而一列则用于年)。而且很可能有一种方法来操纵您想要的东西。

select * from student a 
inner join student_groups b on a.student_studgroup=b.studgroups_number
inner join study on study_studgroup_id=stud_Group_id
where (your condition --)

我可以看到您正在失去耐心,因此,如果必须使用循环,则应使用apply函数之一,而对于那些循环,则需要一个函数反复应用于向量或列表:

library(dplyr)
library(reshape2)

eidsr %>% 
  # values should be in a column (not in headers) 
  melt(id.var = 'district') %>% 
  # extract the new variables
  mutate(week_year = substr(variable, 1, 7),
         variable = sub(".*EIDSR[- _]", "", variable)) %>% 
  # assuming missing values don't have a specific meaning you can just remove them
  na.omit()

#     district            variable value week_year
# 21        Bo Total_Malaria_cases     0   W1_2019
# 23        Bo Total_Malaria_cases    51   W1_2019
# 25        Bo Total_Malaria_cases    64   W1_2019
# 26        Bo Total_Malaria_cases    86   W1_2019
# 27        Bo Total_Malaria_cases    92   W1_2019
# 28        Bo Total_Malaria_cases    47   W1_2019
# 29        Bo Total_Malaria_cases    46   W1_2019
# 30        Bo Total_Malaria_cases    35   W1_2019

我们将在数据的所有星期中应用该功能

wacky_fun <- function(x_chr){
  malaria_col <- paste0(x_chr, '_EIDSR_Total_Malaria_cases')
  sms_col <- paste0(x_chr, '_EIDSR-Mobile_SMS')
  internet_col <- paste0(x_chr, '_EIDSR-Mobile_Internet')

  mode_col <- rep("NoRep", nrow(eidsr))
  mode_col[eidsr[malaria_col]>0] <- "Report"
  mode_col[eidsr[sms_col]==1] <- "Mobile_SMS"
  mode_col[eidsr[internet_col]==1] <- "Mobile_Internet"

  return(mode_col)
}