R新手(ish)。我已经编写了一些在R中使用for()
循环的代码。我想以矢量形式重写它,但是它不起作用。
用于说明的简化示例:
library(dplyr)
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
## if year is blank and name is same as name from previous row
## take year from previous row
## else
## stick with the year you already have
# 1. Run as a loop
x$year_2 <- NA
x$year_2[1] <- x$year[1]
for(row_idx in 2:10)
{
if(is.na(x$year[row_idx]) & (x$name[row_idx] == x$name[row_idx - 1]))
{
x$year_2[row_idx] = x$year_2[row_idx - 1]
}
else
{
x$year_2[row_idx] = x$year[row_idx]
}
}
# 2. Attempt to vectorise
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
x$year_2 <- ifelse(is.na(x$year) & x$name == lead(x$name),
lead(x$year_2),
x$year)
我认为矢量化版本被弄乱了,因为它具有圆形性(即x$year_2
出现在<-
的两侧)。有办法解决这个问题吗?
谢谢。
答案 0 :(得分:4)
我建议您使用已经建立的功能,因为我们受过重新发明轮子的训练,所以R在一开始会感到困难。
library(tidyverse)
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
x %>%
group_by(name) %>%
tidyr::fill(year)
答案 1 :(得分:1)
如果您使用的是dplyr
/ tidyverse
:
library(dplyr)
library(tidyr)
x %>%
group_by(name) %>%
fill("year")
name year
<fct> <dbl>
1 John 1
2 John 1
3 John 2
4 John 3
5 John 3
6 John 3
7 John 4
8 John 4
9 Fred 1
10 Fred 1
答案 2 :(得分:0)
如果您知道数据框始终采用这种排序方式,则应使用最新的不丢失值填充NAs
,以适合您的情况。
library(zoo)
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
x$year_2 <- na.locf(x$year)
x
如果您不想加载zoo
程序包,也可以这样做:
repeat_last = function(x, forward = TRUE, maxgap = Inf, na.rm = FALSE) {
if (!forward) x = rev(x) # reverse x twice if carrying backward
ind = which(!is.na(x)) # get positions of nonmissing values
if (is.na(x[1]) && !na.rm) # if it begins with NA
ind = c(1,ind) # add first pos
rep_times = diff( # diffing the indices + length yields how often
c(ind, length(x) + 1) ) # they need to be repeated
if (maxgap < Inf) {
exceed = rep_times - 1 > maxgap # exceeding maxgap
if (any(exceed)) { # any exceed?
ind = sort(c(ind[exceed] + 1, ind)) # add NA in gaps
rep_times = diff(c(ind, length(x) + 1) ) # diff again
}
}
x = rep(x[ind], times = rep_times) # repeat the values at these indices
if (!forward) x = rev(x) # second reversion
x
}
x$year_3 <- repeat_last(x$year)
x
答案 3 :(得分:0)
可以通过下面的代码在base R中实现此目的的简单方法
x <- within(x, year <- subset(year,!is.na(year))[cumsum(!is.na(year))])
或
x$year <- with(x, subset(year,!is.na(year))[cumsum(!is.na(year))])
如此
> x
name year
1 John 1
2 John 1
3 John 2
4 John 3
5 John 3
6 John 3
7 John 4
8 John 4
9 Fred 1
10 Fred 1