Question

我在data.table中有数据如下：

> x<-df[sample(nrow(df), 10),]
> x      

>                   Importer                 Exporter       Date

 1:                 Ecuador                  United Kingdom 2004-01-13
 2:                  Mexico                   United States 2013-11-19
 3:               Australia                   United States 2006-08-11
 4:           United States                   United States 2009-05-04
 5:                   India                   United States 2007-07-16
 6:               Guatemala                       Guatemala 2014-07-02
 7:                  Israel                          Israel 2000-02-22
 8:                   India                   United States 2014-02-11
 9:                    Peru                            Peru 2007-03-26
10:                  Poland                          France 2014-09-15

我正在尝试创建摘要，以便给定一个时间段（比如十年），我可以找到每个国家/地区显示为进口商和出口商的时间。因此，在上面的例子中，除以十年时所需的输出应该是这样的：

Decade    Country.Name    Importer.Count         Exporter.Count

2000      Ecuador         1                      0
2000      Mexico          1                      1
2000      Australia       1                      0
2000      United States   1                      3
.
.
.
2010     United States    0                      2
.
.
.

到目前为止，我已尝试使用post here建议的聚合和data.table方法，但它们似乎只是给我计算每年进口商/出口商的数量（或我我对此更感兴趣。

> x$Decade<-year(x$Date)-year(x$Date)%%10
> importer_per_yr<-aggregate(Importer ~ Decade, FUN=length, data=x)
> importer_per_yr

   Decade                      Importer

2   2000                       6
3   2010                       4

考虑到聚合使用公式接口，我尝试添加另一个条件，但得到以下错误：

> importer_per_yr<-aggregate(Importer~ Decade + unique(Importer), FUN=length, data=x)
Error in model.frame.default(formula = Importer ~ Decade +  : 
  variable lengths differ (found for 'unique(Importer)')

有没有办法根据十年和进口商/出口商创建摘要？导入器和导出器的摘要是否在不同的表中无关紧要。

Answer 1

我们可以使用data.table方法执行此操作，创建＆＃39;十年＆＃39;分配列:=，然后melt来自＆＃39;范围的数据＆＃39;长期＆＃39;通过指定measure列格式，将其重新整理为“广泛”列。使用dcast，我们将fun.aggregate用作length。

x[, Decade:= year(Date) - year(Date) %%10]
dcast(melt(x, measure = c("Importer", "Exporter"), value.name = "Country"), 
                       Decade + Country~variable, length)
#     Decade        Country Importer Exporter
# 1:   2000      Australia        1        0
# 2:   2000        Ecuador        1        0
# 3:   2000          India        1        0
# 4:   2000         Israel        1        1
# 5:   2000           Peru        1        1
# 6:   2000 United Kingdom        0        1
# 7:   2000  United States        1        3
# 8:   2010         France        0        1
# 9:   2010      Guatemala        1        1
#10:   2010          India        1        0
#11:   2010         Mexico        1        0
#12:   2010         Poland        1        0
#13:   2010  United States        0        2

Answer 2

我认为可以使用aggregate基础中的R：

my.data <- read.csv(text = '
        Importer,             Exporter,           Date
         Ecuador,       United Kingdom,     2004-01-13
          Mexico,        United States,     2013-11-19
       Australia,        United States,     2006-08-11
   United States,        United States,     2009-05-04
           India,        United States,     2007-07-16
       Guatemala,            Guatemala,     2014-07-02
          Israel,               Israel,     2000-02-22
           India,        United States,     2014-02-11
            Peru,                 Peru,     2007-03-26
          Poland,               France,     2014-09-15
', header = TRUE, stringsAsFactors = TRUE, strip.white = TRUE)

my.data$my.Date <- as.Date(my.data$Date, format = "%Y-%m-%d")

my.data <- data.frame(my.data,
                 year  = as.numeric(format(my.data$my.Date, format = "%Y")),
                 month = as.numeric(format(my.data$my.Date, format = "%m")),
                 day   = as.numeric(format(my.data$my.Date, format = "%d")))

my.data$my.decade <- my.data$year - (my.data$year %% 10)

importer.count <- with(my.data, aggregate(cbind(count = Importer) ~ my.decade + Importer, FUN = function(x) { NROW(x) }))
exporter.count <- with(my.data, aggregate(cbind(count = Exporter) ~ my.decade + Exporter, FUN = function(x) { NROW(x) }))

colnames(importer.count) <- c('my.decade', 'country', 'importer.count')
colnames(exporter.count) <- c('my.decade', 'country', 'exporter.count')

my.counts <- merge(importer.count, exporter.count, by = c('my.decade', 'country'), all = TRUE)

my.counts$importer.count[is.na(my.counts$importer.count)] <- 0
my.counts$exporter.count[is.na(my.counts$exporter.count)] <- 0

my.counts

#    my.decade        country importer.count exporter.count
# 1       2000      Australia              1              0
# 2       2000        Ecuador              1              0
# 3       2000          India              1              0
# 4       2000         Israel              1              1
# 5       2000           Peru              1              1
# 6       2000  United States              1              3
# 7       2000 United Kingdom              0              1
# 8       2010      Guatemala              1              1
# 9       2010          India              1              0
# 10      2010         Mexico              1              0
# 11      2010         Poland              1              0
# 12      2010  United States              0              2
# 13      2010         France              0              1

按年/年创建每件商品的计数

2 个答案: