我有以下数据框,包含2列:地址,开始日期,纬度和经度。它是清理给定地址的月份列表。
df = data.frame(address = c("1 ex St", "2 ex St"),
year = (c(2011,2011)),
month = c("February","April"),
latitude = c(341.32,343.3),
longitude =c(432.3, 343.6))
所以数据看起来像这样
address year month latitude longitude
1 ex St 2011 February 341.32 432.3
2 ex St 2011 April 343.30 343.6
现在每行代表一个特定的地址和一个特定的月份清理地址。我想扩大'数据,以便地址列中的每个条目分为2011年每个月12行。我还想添加一个虚拟变量,指示之前是否已经清理过该批次。所以数据应该是这样的:
address year month latitude longitude cleaned
1 ex St 2011 January 341.32 432.3 0
1 ex St 2011 February 341.32 432.3 1
1 ex St 2011 March 341.32 432.3 1
1 ex St 2011 April 341.32 432.3 1
1 ex St 2011 May 341.32 432.3 1
1 ex St 2011 June 341.32 432.3 1
1 ex St 2011 July 341.32 432.3 1
1 ex St 2011 August 341.32 432.3 1
1 ex St 2011 Septmber 341.32 432.3 1
1 ex St 2011 October 341.32 432.3 1
1 ex St 2011 November 341.32 432.3 1
1 ex St 2011 December 341.32 432.3 1
2 ex St 2011 January 343.30 343.6 0
2 ex St 2011 February 343.30 343.6 0
2 ex St 2011 March 343.30 343.6 0
2 ex St 2011 April 343.30 343.6 1
2 ex St 2011 May 343.30 343.6 1
2 ex St 2011 June 343.30 343.6 1
2 ex St 2011 July 343.30 343.6 1
2 ex St 2011 August 343.30 343.6 1
2 ex St 2011 Septmber 343.30 343.6 1
2 ex St 2011 October 343.30 343.6 1
2 ex St 2011 November 343.30 343.6 1
2 ex St 2011 December 343.30 343.6 1
是否有允许我以这种方式按月扩展数据的包或函数?我看过熔化和重塑包装,但它们似乎不适合我的情况。我不一定在寻找答案,只是对使用什么工具的一些指导!
编辑:我使用了以下答案,但清理后的列仍然是。这是输出。
month address year latitude longitude cleaned
1 January 1 ex St 2011 341.32 432.3 0
2 February 1 ex St 2011 341.32 432.3 1
3 March 1 ex St 2011 341.32 432.3 0
4 April 1 ex St 2011 341.32 432.3 1
5 May 1 ex St 2011 341.32 432.3 0
6 June 1 ex St 2011 341.32 432.3 0
7 July 1 ex St 2011 341.32 432.3 0
8 August 1 ex St 2011 341.32 432.3 0
9 September 1 ex St 2011 341.32 432.3 1
10 October 1 ex St 2011 341.32 432.3 1
11 November 1 ex St 2011 341.32 432.3 0
12 December 1 ex St 2011 341.32 432.3 1
13 January 2 ex St 2011 343.3 343.6 1
14 February 2 ex St 2011 343.3 343.6 1
15 March 2 ex St 2011 343.3 343.6 0
16 April 2 ex St 2011 343.3 343.6 0
17 May 2 ex St 2011 343.3 343.6 1
18 June 2 ex St 2011 343.3 343.6 0
19 July 2 ex St 2011 343.3 343.6 1
20 August 2 ex St 2011 343.3 343.6 0
21 September 2 ex St 2011 343.3 343.6 0
22 October 2 ex St 2011 343.3 343.6 1
23 November 2 ex St 2011 343.3 343.6 1
24 December 2 ex St 2011 343.3 343.6 0
我怀疑na.locf()函数不起作用,因为清理过的列从0到1采样,并且没有任何NA要更改。所以现在清理过的列只是一个0和1的随机样本。是否还有其他功能/策略可以让1' s和0对应于清理地址之前和之后?
答案 0 :(得分:3)
按地址拆分,合并所有月份,创建虚拟清理列。然后用现有值填写NA。最后按地址和月份名称排序:
library(zoo) # na.locf to fill NAs
do.call(rbind,
lapply(split(df, df$address), function(i) {
d <- merge(i, data.frame(month = month.name), all.y = TRUE)
# convert to factor, then order by month, so it Jan, Feb, Mar, etc
d$month <- factor(d$month, levels = month.name)
d <- d[ order(d$month), ]
# NA fill down
d <- na.locf(d)
# Make cleaned column
d$clened <- ifelse(is.na(d$address), 0, 1)
# NA fill up
d <- na.locf(d, fromLast = TRUE)
}))
# month address year latitude longitude clened
# 1 ex St.5 January 1 ex St 2011 341.32 432.3 0
# 1 ex St.2 February 1 ex St 2011 341.32 432.3 1
# 1 ex St.8 March 1 ex St 2011 341.32 432.3 1
# 1 ex St.1 April 1 ex St 2011 341.32 432.3 1
# 1 ex St.9 May 1 ex St 2011 341.32 432.3 1
# 1 ex St.7 June 1 ex St 2011 341.32 432.3 1
# 1 ex St.6 July 1 ex St 2011 341.32 432.3 1
# 1 ex St.3 August 1 ex St 2011 341.32 432.3 1
# 1 ex St.12 September 1 ex St 2011 341.32 432.3 1
# 1 ex St.11 October 1 ex St 2011 341.32 432.3 1
# 1 ex St.10 November 1 ex St 2011 341.32 432.3 1
# 1 ex St.4 December 1 ex St 2011 341.32 432.3 1
# 2 ex St.5 January 2 ex St 2011 343.3 343.6 0
# 2 ex St.2 February 2 ex St 2011 343.3 343.6 0
# 2 ex St.8 March 2 ex St 2011 343.3 343.6 0
# 2 ex St.1 April 2 ex St 2011 343.3 343.6 1
# 2 ex St.9 May 2 ex St 2011 343.3 343.6 1
# 2 ex St.7 June 2 ex St 2011 343.3 343.6 1
# 2 ex St.6 July 2 ex St 2011 343.3 343.6 1
# 2 ex St.3 August 2 ex St 2011 343.3 343.6 1
# 2 ex St.12 September 2 ex St 2011 343.3 343.6 1
# 2 ex St.11 October 2 ex St 2011 343.3 343.6 1
# 2 ex St.10 November 2 ex St 2011 343.3 343.6 1
# 2 ex St.4 December 2 ex St 2011 343.3 343.6 1