关于如何使用本示例填充数据框中缺失的行
df <- read.table(textConnection("car,year,month,country,amount
Mazda,2012,02,JP,2344
Ford,2012,04,US,235234
Mazda,2012,03,JP,3455
Mazda,2012,04,JP,43554
Mazda,2012,05,JP,9854
Mazda,2012,06,JP,32556
Ford, 2013,01,US,345"), sep = ",", header = TRUE)
> df
car year month country amount
1 Mazda 2012 2 JP 2344
2 Ford 2012 4 US 235234
3 Mazda 2012 3 JP 3455
4 Mazda 2012 4 JP 43554
5 Mazda 2012 5 JP 9854
6 Mazda 2012 6 JP 32556
7 Ford 2013 1 US 345
我使用tidyr :: complete这样填充月份和年份的缺失行:
tidyr::complete(df, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0))
但是国家迷路了。我已经阅读了tidyr文档,但是它确实很短,因此无法找到其他答案。
# A tibble: 108 x 5
car year month country amount
<fct> <int> <int> <fct> <dbl>
1 " Ford" 2012 1 NA 0
2 " Ford" 2012 2 NA 0
3 " Ford" 2012 3 NA 0
4 " Ford" 2012 4 US 235234
5 " Ford" 2012 5 NA 0
6 " Ford" 2012 6 NA 0
7 " Ford" 2012 7 NA 0
8 " Ford" 2012 8 NA 0
9 " Ford" 2012 9 NA 0
10 " Ford" 2012 10 NA 0
# ... with 98 more rows
如何保存?
答案 0 :(得分:1)
我们可以将其放入#!/usr/bin/perl
use warnings;
use strict;
use IO::Compress::Gzip qw{ gzip $GzipError };
my $outputfile = 'out.gz';
my @tocompress = glob '*.txt';
for my $file (@tocompress) {
next unless -f $file;
print STDERR "Adding $file\n";
gzip($file, $outputfile, Append => 1) or die $GzipError;
}
nesting
答案 1 :(得分:1)
由于您忽略了您在原始文档的第二次询问中打开了一个新问题,因此只需维护元数据数据框:
read.table(textConnection("car,year,month,amount
Mazda,2012,02,2344
Ford,2012,04,235234
Mazda,2012,03,3455
Mazda,2012,04,43554
Mazda,2012,05,9854
Mazda,2012,06,32556
Ford,2013,01,2345"),
sep = ",", header = TRUE, stringsAsFactors = FALSE) -> xdf
merge(
expand.grid(car = unique(xdf$car), year =2012:2014, month=1:12),
xdf, by = c("car", "year", "month"), all.x = TRUE
) -> x2
x2$amount <- ifelse(is.na(x2$amount), 0, x2$amount)
data.frame(
car = c("Mazda", "Ford"),
country = c("JP", "US"),
stringsAsFactors = FALSE
) -> car2country_df
merge(x2, car2country_df)
或通过tidyverse
:
tidyr::complete(
xdf, car = unique(car), year = 2012:2014, month=1:12, fill=list(amount=0)
) %>%
dplyr::left_join(car2country_df)