按月扩展数据

时间:2018-01-31 20:05:30

标签: r dplyr reshape tidyr melt

我有以下数据框,包含2列:地址,开始日期,纬度和经度。它是清理给定地址的月份列表。

df = data.frame(address = c("1 ex St", "2 ex St"), 
               year = (c(2011,2011)),
               month = c("February","April"),
               latitude = c(341.32,343.3),
               longitude =c(432.3, 343.6))

所以数据看起来像这样

  address   year   month    latitude   longitude
  1 ex St   2011   February 341.32     432.3
  2 ex St   2011   April    343.30     343.6

现在每行代表一个特定的地址和一个特定的月份清理地址。我想扩大'数据,以便地址列中的每个条目分为2011年每个月12行。我还想添加一个虚拟变量,指示之前是否已经清理过该批次。所以数据应该是这样的:

  address   year   month    latitude   longitude cleaned
  1 ex St   2011   January  341.32     432.3     0
  1 ex St   2011   February 341.32     432.3     1
  1 ex St   2011   March    341.32     432.3     1
  1 ex St   2011   April    341.32     432.3     1
  1 ex St   2011   May      341.32     432.3     1
  1 ex St   2011   June     341.32     432.3     1
  1 ex St   2011   July     341.32     432.3     1
  1 ex St   2011   August   341.32     432.3     1
  1 ex St   2011   Septmber 341.32     432.3     1
  1 ex St   2011   October  341.32     432.3     1
  1 ex St   2011   November 341.32     432.3     1
  1 ex St   2011   December 341.32     432.3     1
  2 ex St   2011   January  343.30     343.6     0
  2 ex St   2011   February 343.30     343.6     0
  2 ex St   2011   March    343.30     343.6     0
  2 ex St   2011   April    343.30     343.6     1
  2 ex St   2011   May      343.30     343.6     1
  2 ex St   2011   June     343.30     343.6     1
  2 ex St   2011   July     343.30     343.6     1
  2 ex St   2011   August   343.30     343.6     1
  2 ex St   2011   Septmber 343.30     343.6     1
  2 ex St   2011   October  343.30     343.6     1
  2 ex St   2011   November 343.30     343.6     1
  2 ex St   2011   December 343.30     343.6     1

是否有允许我以这种方式按月扩展数据的包或函数?我看过熔化和重塑包装,但它们似乎不适合我的情况。我不一定在寻找答案,只是对使用什么工具的一些指导!

编辑:我使用了以下答案,但清理后的列仍然是。这是输出。

       month address year latitude longitude cleaned
1    January 1 ex St 2011   341.32     432.3       0
2   February 1 ex St 2011   341.32     432.3       1
3      March 1 ex St 2011   341.32     432.3       0
4      April 1 ex St 2011   341.32     432.3       1
5        May 1 ex St 2011   341.32     432.3       0
6       June 1 ex St 2011   341.32     432.3       0
7       July 1 ex St 2011   341.32     432.3       0
8     August 1 ex St 2011   341.32     432.3       0
9  September 1 ex St 2011   341.32     432.3       1
10   October 1 ex St 2011   341.32     432.3       1
11  November 1 ex St 2011   341.32     432.3       0
12  December 1 ex St 2011   341.32     432.3       1
13   January 2 ex St 2011    343.3     343.6       1
14  February 2 ex St 2011    343.3     343.6       1
15     March 2 ex St 2011    343.3     343.6       0
16     April 2 ex St 2011    343.3     343.6       0
17       May 2 ex St 2011    343.3     343.6       1
18      June 2 ex St 2011    343.3     343.6       0
19      July 2 ex St 2011    343.3     343.6       1
20    August 2 ex St 2011    343.3     343.6       0
21 September 2 ex St 2011    343.3     343.6       0
22   October 2 ex St 2011    343.3     343.6       1
23  November 2 ex St 2011    343.3     343.6       1
24  December 2 ex St 2011    343.3     343.6       0

我怀疑na.locf()函数不起作用,因为清理过的列从0到1采样,并且没有任何NA要更改。所以现在清理过的列只是一个0和1的随机样本。是否还有其他功能/策略可以让1' s和0对应于清理地址之前和之后?

1 个答案:

答案 0 :(得分:3)

按地址拆分,合并所有月份,创建虚拟清理列。然后用现有值填写NA。最后按地址和月份名称排序:

library(zoo) # na.locf to fill NAs

do.call(rbind,
        lapply(split(df, df$address), function(i) {
          d <- merge(i, data.frame(month = month.name), all.y = TRUE)
          # convert to factor, then order by month, so it Jan, Feb, Mar, etc
          d$month <- factor(d$month, levels = month.name)
          d <- d[ order(d$month), ]
          # NA fill down
          d <- na.locf(d)
          # Make cleaned column 
          d$clened <- ifelse(is.na(d$address), 0, 1)
          # NA fill up
          d <- na.locf(d, fromLast = TRUE)
        }))

#                month address year latitude longitude clened
# 1 ex St.5    January 1 ex St 2011   341.32     432.3      0
# 1 ex St.2   February 1 ex St 2011   341.32     432.3      1
# 1 ex St.8      March 1 ex St 2011   341.32     432.3      1
# 1 ex St.1      April 1 ex St 2011   341.32     432.3      1
# 1 ex St.9        May 1 ex St 2011   341.32     432.3      1
# 1 ex St.7       June 1 ex St 2011   341.32     432.3      1
# 1 ex St.6       July 1 ex St 2011   341.32     432.3      1
# 1 ex St.3     August 1 ex St 2011   341.32     432.3      1
# 1 ex St.12 September 1 ex St 2011   341.32     432.3      1
# 1 ex St.11   October 1 ex St 2011   341.32     432.3      1
# 1 ex St.10  November 1 ex St 2011   341.32     432.3      1
# 1 ex St.4   December 1 ex St 2011   341.32     432.3      1
# 2 ex St.5    January 2 ex St 2011    343.3     343.6      0
# 2 ex St.2   February 2 ex St 2011    343.3     343.6      0
# 2 ex St.8      March 2 ex St 2011    343.3     343.6      0
# 2 ex St.1      April 2 ex St 2011    343.3     343.6      1
# 2 ex St.9        May 2 ex St 2011    343.3     343.6      1
# 2 ex St.7       June 2 ex St 2011    343.3     343.6      1
# 2 ex St.6       July 2 ex St 2011    343.3     343.6      1
# 2 ex St.3     August 2 ex St 2011    343.3     343.6      1
# 2 ex St.12 September 2 ex St 2011    343.3     343.6      1
# 2 ex St.11   October 2 ex St 2011    343.3     343.6      1
# 2 ex St.10  November 2 ex St 2011    343.3     343.6      1
# 2 ex St.4   December 2 ex St 2011    343.3     343.6      1