按因子填充序列

时间:2017-08-18 18:55:49

标签: r

我需要用$ Country的因子填充缺少序列值的$ Year。 $ Count列可以用0填充。

Country Year Count
A       1    1
A       2    1
A       4    2
B       1    1
B       3    1

所以我最终得到了

Country Year Count
A       1    1
A       2    1
A       3    0
A       4    2
B       1    1
B       2    0
B       3    1

希望这是明确的家伙,提前谢谢!

7 个答案:

答案 0 :(得分:5)

这是使用dplyrtidyr的{​​{1}} / complete解决方案:

full_seq

答案 1 :(得分:4)

library(data.table)
# d is your original data.frame
setDT(d)
foo <- d[, .(Year = min(Year):max(Year)), Country]
res <- merge(d, foo, all.y = TRUE)[is.na(Count), Count := 0]

enter image description here

答案 2 :(得分:4)

类似于@ PoGibas&#39;回答:

 tshark tcp port 6633

给出了

library(data.table)

# set default values
def = list(Count = 0L)

# create table with all levels    
fullDT = setkey(DT[, .(Year = seq(min(Year), max(Year))), by=Country])

# initialize to defaults
fullDT[, names(def) := def ]

# overwrite from data
fullDT[DT, names(def) := mget(sprintf("i.%s", names(def))) ]

这概括为包含更多列( Country Year Count 1: A 1 1 2: A 2 1 3: A 3 0 4: A 4 2 5: B 1 1 6: B 2 0 7: B 3 1 除外)。我想类似的功能存在于&#34; tidyverse&#34;中,其名称类似于&#34; expand&#34;或者&#34;完成&#34;。

答案 3 :(得分:4)

另一个基础R想法可以分为国家/地区,使用d: @rem note that's important to change the drive 'permanently' cd d:\home zsh.exe 查找setdiff中的缺失值,并seq(max(Year))将它们转换为原始数据框。使用rbinddo.call列表返回到数据框,即

rbind

给出,

d1 <- do.call(rbind, c(lapply(split(df, df$Country), function(i){
                       x <- rbind(i, data.frame(Country = i$Country[1], 
                                                 Year = setdiff(seq(max(i$Year)), i$Year), 
                                                 Count = 0)); 
                        x[with(x, order(Year)),]}), make.row.names = FALSE))

答案 4 :(得分:2)

> setkey(DT,Country,Year)
> DT[setkey(DT[, .(min(Year):max(Year)), by = Country], Country, V1)]
   Country Year Count
1:       A    1     1
2:       A    2     1
3:       A    3    NA
4:       A    4     2
5:       B    1     1
6:       B    2    NA
7:       B    3     1

答案 5 :(得分:2)

另一个dplyrtidyr解决方案。

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  group_by(Country) %>%
  do(data_frame(Country = unique(.$Country),
                Year = full_seq(.$Year, 1))) %>%
  full_join(dt, by = c("Country", "Year")) %>%
  replace_na(list(Count = 0))

答案 6 :(得分:2)

以下是基础R中使用<div class="card mb-3" ng-if="key > 1" ng-repeat="(key, game) in scoreboard.games.game"> <div class="card-header" align="center"> {{ game.away_team_name }} ({{ game.away_win }}-{{ game.away_loss }}) At {{ game.home_team_name }} ({{ game.home_win }}-{{ game.home_loss }})<br /> <small>{{ game.time }}</small> </div> <div class="card-block"></div> </div> tapplydo.callrange来计算年度序列的方法。然后从返回的命名列表构造一个data.frame,将其合并到添加所需行的原始文件上,最后填写缺失值。

seq

返回

# get named list with year sequences
temp <- tapply(dat$Year, dat$Country, function(x) do.call(seq, as.list(range(x))))

# construct data.frame
mydf <- data.frame(Year=unlist(temp), Country=rep(names(temp), lengths(temp)))

# merge onto original
mydf <- merge(dat, mydf, all=TRUE)

# fill in missing values
 mydf[is.na(mydf)] <- 0