我有一个按国家/地区列出数据的数据表,我想添加区域信息。我做了默认"欧洲"所以我需要纠正那些不在"欧洲"区域。这是我需要分配的内容:
ordered_tally[ , region :="Europe"]
AsiaRows = c(5, 12, 17, 24, 25, 26, 34, 36, 45, 56, 61, 64, 75, 76, 81, 88, 90, 91)
NorthAmericaRows = c(7, 18, 42, 47)
SouthAmericaRows = c(11, 13, 32, 38, 41, 46, 48, 52, 53, 57, 58, 68, 77, 78, 83)
AfricaRows = c(22, 66, 67, 70, 74, 82, 84, 86, 95)
OceaniaRows = c(31, 62)
MiddleEastRows = c(20, 51, 71, 72, 87, 89, 92)
region = c("Asia", "NorthAmerica", "SouthAmerica", "Africa", "Oceania", "MiddleEast")
region_indices = c(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows)
这对我有用:
lapply(c(1:6), function(x) ordered_tally[region_indices[[x]], "region":= eval(region[x]
)])
有没有更好的方法呢?我的感觉lapply
不应该是必要的,必须有本地data.table
操作我&# 39; m失踪。
此处ordered_tally
:
V1 country freq
1: 1 Poland 27718
2: 2 Spain 9800
3: 3 Denmark 7551
4: 4 United Kingdom 4910
5: 5 Thailand 4069
6: 6 Norway 3995
7: 7 United States 3837
8: 8 Netherlands 3075
9: 9 France 2866
10: 10 Italy 2429
11: 11 Brazil 2109
12: 12 India 1804
13: 13 Argentina 1610
14: 14 Germany 1437
15: 15 Sweden 1409
16: 16 Lithuania 1339
17: 17 Taiwan 1241
18: 18 Mexico 1218
19: 19 Latvia 1152
20: 20 Israel 1076
21: 21 Czech Republic 1064
22: 22 South Africa 1036
23: 23 Portugal 897
24: 24 Malaysia 889
25: 25 Indonesia 851
26: 26 Hong Kong 722
27: 27 Russia 670
28: 28 Switzerland 635
29: 29 Finland 613
30: 30 Ireland 603
31: 31 Australia 588
32: 32 Chile 573
33: 33 Estonia 540
34: 34 China 514
35: 35 Belgium 498
36: 36 Singapore 488
37: 37 Austria 464
38: 38 Colombia 361
39: 39 Belarus 330
40: 40 Romania 330
41: 41 Guatemala 311
42: 42 Canada 295
43: 43 Bulgaria 288
44: 44 Greece 246
45: 45 South Korea 231
46: 46 Costa Rica 218
47: 47 Puerto Rico 218
48: 48 Venezuela 204
49: 49 Serbia 203
50: 50 Turkey 184
51: 51 United Arab Emirates 175
52: 52 Ecuador 172
53: 53 Peru 161
54: 54 Hungary 136
55: 55 Montenegro 131
56: 56 Japan 127
57: 57 El Salvador 95
58: 58 Uruguay 84
59: 59 Cyprus 70
60: 60 Iceland 59
61: 61 Bangladesh 57
62: 62 New Zealand 47
63: 63 Moldova 42
64: 64 Philippines 41
65: 65 Slovakia 36
66: 66 Mauritius 32
67: 67 Nigeria 30
68: 68 Bolivia 25
69: 69 Croatia 25
70: 70 Zimbabwe 24
71: 71 Saudi Arabia 20
72: 72 Oman 19
73: 73 Slovenia 19
74: 74 Egypt 15
75: 75 Vietnam 14
76: 76 Sri Lanka 10
77: 77 Dominican Republic 9
78: 78 Nicaragua 9
79: 79 Macedonia (FYROM) 5
80: 80 Ukraine 5
81: 81 Azerbaijan 4
82: 82 Namibia 4
83: 83 Barbados 3
84: 84 Botswana 3
85: 85 Malta 3
86: 86 Ghana 2
87: 87 Iraq 2
88: 88 Kazakhstan 2
89: 89 Lebanon 2
90: 90 Macau 2
91: 91 Nepal 2
92: 92 Qatar 2
93: 93 Albania 1
94: 94 Faroe Islands 1
95: 95 Morocco 1
并且这里是要复制的dump
代码:
ordered_tally <-
data.table(V1 = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74",
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85",
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95"),
country = c("Poland", "Spain", "Denmark", "United Kingdom",
"Thailand", "Norway", "United States", "Netherlands", "France",
"Italy", "Brazil", "India", "Argentina", "Germany", "Sweden",
"Lithuania", "Taiwan", "Mexico", "Latvia", "Israel", "Czech Republic",
"South Africa", "Portugal", "Malaysia", "Indonesia", "Hong Kong",
"Russia", "Switzerland", "Finland", "Ireland", "Australia",
"Chile", "Estonia", "China", "Belgium", "Singapore", "Austria",
"Colombia", "Belarus", "Romania", "Guatemala", "Canada",
"Bulgaria", "Greece", "South Korea", "Costa Rica", "Puerto Rico",
"Venezuela", "Serbia", "Turkey", "United Arab Emirates",
"Ecuador", "Peru", "Hungary", "Montenegro", "Japan", "El Salvador",
"Uruguay", "Cyprus", "Iceland", "Bangladesh", "New Zealand",
"Moldova", "Philippines", "Slovakia", "Mauritius", "Nigeria",
"Bolivia", "Croatia", "Zimbabwe", "Saudi Arabia", "Oman",
"Slovenia", "Egypt", "Vietnam", "Sri Lanka", "Dominican Republic",
"Nicaragua", "Macedonia (FYROM)", "Ukraine", "Azerbaijan",
"Namibia", "Barbados", "Botswana", "Malta", "Ghana", "Iraq",
"Kazakhstan", "Lebanon", "Macau", "Nepal", "Qatar", "Albania",
"Faroe Islands", "Morocco"), freq = c(27718L, 9800L, 7551L,
4910L, 4069L, 3995L, 3837L, 3075L, 2866L, 2429L, 2109L, 1804L,
1610L, 1437L, 1409L, 1339L, 1241L, 1218L, 1152L, 1076L, 1064L,
1036L, 897L, 889L, 851L, 722L, 670L, 635L, 613L, 603L, 588L,
573L, 540L, 514L, 498L, 488L, 464L, 361L, 330L, 330L, 311L,
295L, 288L, 246L, 231L, 218L, 218L, 204L, 203L, 184L, 175L,
172L, 161L, 136L, 131L, 127L, 95L, 84L, 70L, 59L, 57L, 47L,
42L, 41L, 36L, 32L, 30L, 25L, 25L, 24L, 20L, 19L, 19L, 15L,
14L, 10L, 9L, 9L, 5L, 5L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L), user_count = c(4126L, 1533L,
1312L, 870L, 620L, 867L, 668L, 493L, 448L, 401L, 530L, 314L,
246L, 318L, 338L, 227L, 205L, 201L, 170L, 237L, 166L, 165L,
173L, 135L, 171L, 116L, 97L, 139L, 123L, 94L, 102L, 100L,
162L, 122L, 86L, 73L, 101L, 79L, 49L, 68L, 28L, 37L, 41L,
50L, 49L, 28L, 27L, 35L, 47L, 59L, 22L, 35L, 34L, 16L, 29L,
30L, 14L, 24L, 16L, 15L, 29L, 2L, 15L, 6L, 10L, 8L, 6L, 3L,
3L, 7L, 5L, 6L, 3L, 4L, 6L, 7L, 3L, 1L, 1L, 4L, 2L, 2L, 1L,
1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L))
答案 0 :(得分:5)
我建议使用名为list
而不是c
:
region_indices = setNames(list(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows), region)
然后将其变为data.frame
m = setNames(stack(region_indices), c("row", "region"))
然后使用
进行分配ordered_tally[m$row, region := as.character(m$region)]
此时,region_indices
中未包含的行NA
为region
。如果您确定这些都是欧洲的,请使用
ordered_tally[is.na(region), region := "Europe"]
评论。通常,通过行号存储您的数据是不安全的。 (我的意思是,你可能会在路上为自己制造麻烦。)按国家名称划分地区名单会好得多。