纠正R中的循环,以对应于两个变量

时间:2016-02-09 18:26:40

标签: r loops dataframe sum

晚上好,

我正在尝试构建正确的循环(无论是forwhile还是if),它会在给定特定日期和变量的情况下对一列中的值求和并将其存入数据框中的特定位置。我在循环中的所有尝试都是非常错误的,我显然没有得到正确的起点!

在下面的Sample.Frame中,我想填充LON.NEW6列的第一行,其中No.of.Rooms的{​​{1}}总和为Sample.Bookings 1}}匹配与列标题匹配的Stay.DateStart.Date。这需要在整个框架中重复。

部分填充的Legacy.Hotel.Code数据框:

Sample.Frame

> Sample.Frame Start.Date LON.NEW6 LON.CAP LON.CEN2 LON.MYH1 LON.SOS LON.MOW LON.HOLW LON.94V LON.FOU6 LON.949 E0-001-085571068-9 30/09/2015 NA NA NA NA NA NA NA NA NA NA E0-001-086838711-7 07/11/2015 NA NA NA NA NA NA NA NA NA NA E0-001-085536178-4@2015102019 20/10/2015 NA NA NA NA NA NA NA NA NA NA E0-001-085466318-0 01/07/2016 NA NA NA NA NA NA NA NA NA NA E0-001-085591039-5 30/01/2016 NA NA NA NA NA NA NA NA NA NA E0-001-087500856-4 29/04/2016 NA NA NA NA NA NA NA NA NA NA E0-001-079398784-2@2015092909 29/09/2015 NA NA NA NA NA NA NA NA NA NA E0-001-086021337-5 14/10/2015 NA NA NA NA NA NA NA NA NA NA E0-001-086639435-3 20/12/2015 NA NA NA NA NA NA NA NA NA NA E0-001-087220018-9 27/10/2015 NA NA NA NA NA NA NA NA NA NA 数据框:

Sample.Bookings

我应该最终得到的东西开始如此:

> Sample.Bookings
          City Booking.ID Legacy.Hotel.Code Star.Rating.ID  Stay.Date No.of.Rooms
1146767 London   17480238          LON NEW6              2 30/09/2015           3
220037  London   18381583          LON CEN2              3 29/09/2015           1
668476  London   15184820          LON NEW6              2 07/11/2015           1
1073551 London   16414241          LON CEN2              3 01/07/2016           1
138695  London     554331           LON CAP              5 29/04/2016           1
301805  London   17134981          LON NEW6              2 30/09/2015           1
181300  London     193930          LON CEN2              3 01/07/2016           1
1204682 London   15154547           LON CAP              5 23/07/2015           1
1549067 London   14436933          LON NEW6              2 20/10/2015           1
832903  London   13796464          LON NEW6              2 20/10/2015           1
301778  London   16304861          LON NEW6              2 22/11/2015           1
399343  London   16855128          LON NEW6              2 07/11/2015           1
399337  London   14855974          LON NEW6              2 03/04/2015           1
1472157 London   18320357          LON NEW6              2 17/01/2016           1
1184525 London   18360304          LON CEN2              3 05/02/2016           1
1342678 London   17623052           LON CAP              5 01/02/2016           1
420443  London   18381583          LON CEN2              3 20/02/2016           1
1435511 London   15230186          LON NEW6              2 22/08/2015           3
1201521 London   16319154          LON NEW6              2 05/09/2015           1
1233528 London   15460211          LON NEW6              2 28/07/2015           1

如果没有与该日期和变量对应的预订,则所有元素都使用总和填充或保留为NA。这是一个小样本,但我正在使用的集合可能有数千行和数百列。

我提前感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

你可以通过dplyr / tidyr获得你想要的东西:

library(dplyr)
library(tidyr)
Counts <- group_by(Sample.Bookings, Stay.Date, Legacy.Hotel.Code) %>% 
    summarise(n=n()) %>% 
    spread(Legacy.Hotel.Code, n)

您可以从Sample.Frame中添加行名称(看起来就像使用full_join所需的唯一信息一样:

Sample.Frame$Row.Name<-row.names(Sample.Frame)
full_join(Sample.Frame[,c("Row.Name", "Start.Date")], Counts, by= c("Start.Date"="Stay.Date"))

输出:

              Row.Name Start.Date LON.CAP LON.CEN2 LON.NEW6
1             E0-001-085571068-9 30/09/2015      NA       NA        2
2             E0-001-086838711-7 07/11/2015      NA       NA        2
3  E0-001-085536178-4@2015102019 20/10/2015      NA       NA        2
4             E0-001-085466318-0 01/07/2016      NA        2       NA
5             E0-001-085591039-5 30/01/2016      NA       NA       NA
6             E0-001-087500856-4 29/04/2016       1       NA       NA
7  E0-001-079398784-2@2015092909 29/09/2015      NA        1       NA
8             E0-001-086021337-5 14/10/2015      NA       NA       NA
9             E0-001-086639435-3 20/12/2015      NA       NA       NA
10            E0-001-087220018-9 27/10/2015      NA       NA       NA
11                          <NA> 01/02/2016       1       NA       NA
12                          <NA> 03/04/2015      NA       NA        1
13                          <NA> 05/02/2016      NA        1       NA
14                          <NA> 05/09/2015      NA       NA        1
15                          <NA> 17/01/2016      NA       NA        1
16                          <NA> 20/02/2016      NA        1       NA
17                          <NA> 22/08/2015      NA       NA        1
18                          <NA> 22/11/2015      NA       NA        1
19                          <NA> 23/07/2015       1       NA       NA
20                          <NA> 28/07/2015      NA       NA        1

答案 1 :(得分:0)

我使用dcastreshape2 merge函数执行了此操作。从Sample.Frame data.frame,您真正需要保留的是Start.Date和可能是行名称(如果有相关信息)

library(reshape2)

# all you need is the Start.Date and the row names
Sample.Frame <- data.frame(Start.Date = c("30/09/2015", "07/11/2015", "27/10/2015"))
Sample.Frame$row.names <- row.names(Sample.Frame)


# test data.frame
Sample.Bookings <- data.frame(Legacy.Hotel.Code = c("LON.NEW6", "LON.CAP"),
                          Stay.Date = c("30/09/2015", "07/11/2015"),
                          No.of.Rooms = c(3, 1))

#use dcast to go back to wide format
Sample.Bookings.cast <- dcast(Sample.Bookings, Stay.Date ~ Legacy.Hotel.Code, 
                          value.var = "No.of.Rooms")

#Change Stay.Date to Start.Date so we can merge on that column
colnames(Sample.Bookings.cast)[which(colnames(Sample.Bookings.cast) == "Stay.Date")] <- "Start.Date"

#merge on Start.Date
Final.Date <- merge(Sample.Frame, Sample.Bookings.cast, by = "Start.Date", all.x = TRUE)

Final.Date
  Start.Date row.names LON.CAP LON.NEW6
1 07/11/2015         2       1       NA
2 27/10/2015         3      NA       NA
3 30/09/2015         1      NA        3