data.table:=分配和分组

时间:2013-05-18 02:56:06

标签: r data.table grouping assign

处理特定国家/地区的数据。需要将国家/地区分配到预先定义的国家/地区组。编写如下代码。想知道是否有更有效的脚本方法,每次进入数据库时​​,不要键入每个新国家进入NON-CORE组的部分?听起来像是别的。但不知道如何编码。

library(data.table)
data<- data.table(data)
setkey(data,Region.Group)
data[list(c(
  "Australia",
  "Bangladesh",
  "Cambodia",
  "Estonia",
  "Finland",
  "France",
  "India",
  "Indonesia",
  "Korea",
  "Lithuania",
  "Malaysia",
  "Middle East",
  "Norway",
  "Philippines",
  "Poland",
  "Russia",
  "Spain",
  "Sri Lanka",
  "Sweden",
  "Switzerland",
  "TAT Region",
  "Thailand",
  "Ukraine",
  "Vietnam",
  "New Zealand",
  "Israel",
  "Myanmar",
  "Pakistan",
  "Portugal",
  "Turkey",
  "Portugal")), Core:="NON-CORE"]
data[list(c(
  "Belgium",
  "Netherlands")), Core:="Benelux"]
data[list(c(
  "China Group")), Core:="China"]
data[list(c(
  "Germany")), Core:="Germany"]
data[list(c(
  "Hong Kong Group")), Core:="Hong Kong"]
data[list(c(
  "Italy")), Core:="Italy"]
data[list(c(
  "Japan")), Core:="Japan"]
data[list(c(
  "North America Central",
  "North America East",
  "North America North",
  "North America South",
  "North America West")), Core:="N.America"]
data[list(c(
  "Singapore")), Core:="Singapore"]
data[list(c(
  "Taiwan")), Core:="Taiwan"]
data[list(c(
  "United Kingdom")), Core:="UK"]

1 个答案:

答案 0 :(得分:2)

我想你需要在某个时候将国家置于正确的群体中。列表(在此缩短)如何,我们不打算放入非核心国家:

coregroup <- list(
    Benelux     =   c("Belgium","Netherlands"),
    Germany     =   "Germany"
)

然后您可以从此列表中创建data.table

dt_coregroup <- data.table(
    Core=rep(names(coregroup),lapply(coregroup,length)),
    Region.Group=unlist(coregroup)
)
#       Core Region.Group
# 1: Benelux      Belgium
# 2: Benelux  Netherlands
# 3: Germany      Germany

并将其合并回原始数据。我已经输入了一些无意义的数据并将其重命名为“dt_start”,因为显然“数据”已经是R函数。

dt_start <- data.table(Region.Group=c("Germany","Belgium","Australia"),Period=rep("2013Q1",3),Qty1=1:3)
setkey(dt_start,Region.Group)
setkey(dt_coregroup,Region.Group)

dt_new <- dt_coregroup[dt_start]
#    Region.Group    Core Period Qty1
# 1:    Australia      NA 2013Q1    3
# 2:      Belgium Benelux 2013Q1    2
# 3:      Germany Germany 2013Q1    1

最后,在最后一步中,我们将任何未分​​组的国家/地区分配给NON-CORE:

dt_new[is.na(Core),Core:="NON-CORE"]
#    Region.Group     Core Period Qty1
# 1:    Australia NON-CORE 2013Q1    3
# 2:      Belgium  Benelux 2013Q1    2
# 3:      Germany  Germany 2013Q1    1