基于索引向量有效地将值分配给特定数据表行号

时间:2015-10-21 15:22:48

标签: r data.table

我有一个按国家/地区列出数据的数据表,我想添加区域信息。我做了默认"欧洲"所以我需要纠正那些不在"欧洲"区域。这是我需要分配的内容:

    ordered_tally[ , region :="Europe"]
    AsiaRows = c(5, 12, 17, 24, 25, 26, 34, 36, 45, 56, 61, 64, 75, 76, 81, 88, 90, 91)
    NorthAmericaRows = c(7, 18, 42, 47)
    SouthAmericaRows = c(11, 13, 32, 38, 41, 46, 48, 52, 53, 57, 58, 68, 77, 78, 83)
    AfricaRows = c(22, 66, 67, 70, 74, 82, 84, 86, 95)
    OceaniaRows = c(31, 62)
    MiddleEastRows = c(20, 51, 71, 72, 87, 89, 92)
    region = c("Asia", "NorthAmerica", "SouthAmerica", "Africa", "Oceania", "MiddleEast")
    region_indices = c(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows)

这对我有用:

lapply(c(1:6), function(x) ordered_tally[region_indices[[x]], "region":= eval(region[x])])

有没有更好的方法呢?我的感觉lapply不应该是必要的,必须有本地data.table操作我&# 39; m失踪。

此处ordered_tally

    V1              country  freq
 1:  1               Poland 27718
 2:  2                Spain  9800
 3:  3              Denmark  7551
 4:  4       United Kingdom  4910
 5:  5             Thailand  4069
 6:  6               Norway  3995
 7:  7        United States  3837
 8:  8          Netherlands  3075
 9:  9               France  2866
10: 10                Italy  2429
11: 11               Brazil  2109
12: 12                India  1804
13: 13            Argentina  1610
14: 14              Germany  1437
15: 15               Sweden  1409
16: 16            Lithuania  1339
17: 17               Taiwan  1241
18: 18               Mexico  1218
19: 19               Latvia  1152
20: 20               Israel  1076
21: 21       Czech Republic  1064
22: 22         South Africa  1036
23: 23             Portugal   897
24: 24             Malaysia   889
25: 25            Indonesia   851
26: 26            Hong Kong   722
27: 27               Russia   670
28: 28          Switzerland   635
29: 29              Finland   613
30: 30              Ireland   603
31: 31            Australia   588
32: 32                Chile   573
33: 33              Estonia   540
34: 34                China   514
35: 35              Belgium   498
36: 36            Singapore   488
37: 37              Austria   464
38: 38             Colombia   361
39: 39              Belarus   330
40: 40              Romania   330
41: 41            Guatemala   311
42: 42               Canada   295
43: 43             Bulgaria   288
44: 44               Greece   246
45: 45          South Korea   231
46: 46           Costa Rica   218
47: 47          Puerto Rico   218
48: 48            Venezuela   204
49: 49               Serbia   203
50: 50               Turkey   184
51: 51 United Arab Emirates   175
52: 52              Ecuador   172
53: 53                 Peru   161
54: 54              Hungary   136
55: 55           Montenegro   131
56: 56                Japan   127
57: 57          El Salvador    95
58: 58              Uruguay    84
59: 59               Cyprus    70
60: 60              Iceland    59
61: 61           Bangladesh    57
62: 62          New Zealand    47
63: 63              Moldova    42
64: 64          Philippines    41
65: 65             Slovakia    36
66: 66            Mauritius    32
67: 67              Nigeria    30
68: 68              Bolivia    25
69: 69              Croatia    25
70: 70             Zimbabwe    24
71: 71         Saudi Arabia    20
72: 72                 Oman    19
73: 73             Slovenia    19
74: 74                Egypt    15
75: 75              Vietnam    14
76: 76            Sri Lanka    10
77: 77   Dominican Republic     9
78: 78            Nicaragua     9
79: 79    Macedonia (FYROM)     5
80: 80              Ukraine     5
81: 81           Azerbaijan     4
82: 82              Namibia     4
83: 83             Barbados     3
84: 84             Botswana     3
85: 85                Malta     3
86: 86                Ghana     2
87: 87                 Iraq     2
88: 88           Kazakhstan     2
89: 89              Lebanon     2
90: 90                Macau     2
91: 91                Nepal     2
92: 92                Qatar     2
93: 93              Albania     1
94: 94        Faroe Islands     1
95: 95              Morocco     1

并且这里是要复制的dump代码:

ordered_tally <-
data.table(V1 = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", 
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", 
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", 
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", 
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95"), 
    country = c("Poland", "Spain", "Denmark", "United Kingdom", 
    "Thailand", "Norway", "United States", "Netherlands", "France", 
    "Italy", "Brazil", "India", "Argentina", "Germany", "Sweden", 
    "Lithuania", "Taiwan", "Mexico", "Latvia", "Israel", "Czech Republic", 
    "South Africa", "Portugal", "Malaysia", "Indonesia", "Hong Kong", 
    "Russia", "Switzerland", "Finland", "Ireland", "Australia", 
    "Chile", "Estonia", "China", "Belgium", "Singapore", "Austria", 
    "Colombia", "Belarus", "Romania", "Guatemala", "Canada", 
    "Bulgaria", "Greece", "South Korea", "Costa Rica", "Puerto Rico", 
    "Venezuela", "Serbia", "Turkey", "United Arab Emirates", 
    "Ecuador", "Peru", "Hungary", "Montenegro", "Japan", "El Salvador", 
    "Uruguay", "Cyprus", "Iceland", "Bangladesh", "New Zealand", 
    "Moldova", "Philippines", "Slovakia", "Mauritius", "Nigeria", 
    "Bolivia", "Croatia", "Zimbabwe", "Saudi Arabia", "Oman", 
    "Slovenia", "Egypt", "Vietnam", "Sri Lanka", "Dominican Republic", 
    "Nicaragua", "Macedonia (FYROM)", "Ukraine", "Azerbaijan", 
    "Namibia", "Barbados", "Botswana", "Malta", "Ghana", "Iraq", 
    "Kazakhstan", "Lebanon", "Macau", "Nepal", "Qatar", "Albania", 
    "Faroe Islands", "Morocco"), freq = c(27718L, 9800L, 7551L, 
    4910L, 4069L, 3995L, 3837L, 3075L, 2866L, 2429L, 2109L, 1804L, 
    1610L, 1437L, 1409L, 1339L, 1241L, 1218L, 1152L, 1076L, 1064L, 
    1036L, 897L, 889L, 851L, 722L, 670L, 635L, 613L, 603L, 588L, 
    573L, 540L, 514L, 498L, 488L, 464L, 361L, 330L, 330L, 311L, 
    295L, 288L, 246L, 231L, 218L, 218L, 204L, 203L, 184L, 175L, 
    172L, 161L, 136L, 131L, 127L, 95L, 84L, 70L, 59L, 57L, 47L, 
    42L, 41L, 36L, 32L, 30L, 25L, 25L, 24L, 20L, 19L, 19L, 15L, 
    14L, 10L, 9L, 9L, 5L, 5L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L), user_count = c(4126L, 1533L, 
    1312L, 870L, 620L, 867L, 668L, 493L, 448L, 401L, 530L, 314L, 
    246L, 318L, 338L, 227L, 205L, 201L, 170L, 237L, 166L, 165L, 
    173L, 135L, 171L, 116L, 97L, 139L, 123L, 94L, 102L, 100L, 
    162L, 122L, 86L, 73L, 101L, 79L, 49L, 68L, 28L, 37L, 41L, 
    50L, 49L, 28L, 27L, 35L, 47L, 59L, 22L, 35L, 34L, 16L, 29L, 
    30L, 14L, 24L, 16L, 15L, 29L, 2L, 15L, 6L, 10L, 8L, 6L, 3L, 
    3L, 7L, 5L, 6L, 3L, 4L, 6L, 7L, 3L, 1L, 1L, 4L, 2L, 2L, 1L, 
    1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L))

1 个答案:

答案 0 :(得分:5)

我建议使用名为list而不是c

region_indices = setNames(list(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows), region)

然后将其变为data.frame

m = setNames(stack(region_indices), c("row", "region"))

然后使用

进行分配
ordered_tally[m$row, region := as.character(m$region)]

此时,region_indices中未包含的行NAregion。如果您确定这些都是欧洲的,请使用

ordered_tally[is.na(region), region := "Europe"]

评论。通常,通过行号存储您的数据是不安全的。 (我的意思是,你可能会在路上为自己制造麻烦。)按国家名称划分地区名单会好得多。