以下是data.table
z
的外观。 (dput
输出在问题的底部提供) -
> require(data.table); z
SurveyResponseID WhereStayed LoS Nights
1: 3274455 Wellington 42741.9436 0.0000
2: 3274476 Raglan 39591.9555 0.0000
3: 3274493 Auckland 877.0862 877.0862
4: 3274503 Matakohe 6865.8103 NA
5: 3274506 Auckland 81982.5017 0.0000
---
146: 3275696 Clevedon 2871.3504 NA
147: 3275707 Hastings 748.8108 561.6081
148: 3275708 Stratford 23785.4769 0.0000
149: 3275715 Waitomo 1600.3829 0.0000
150: 3275728 Cape Reinga 11787.2847 0.0000
列Nights
有几个NA
个。我希望将LoS
的值按照WhereStayed
的位置与非NA
数据的其余部分相同的比例进行分摊。
例如,考虑SurveyResponseID == 3274528
。
> z[SurveyResponseID == 3274528]
SurveyResponseID WhereStayed LoS Nights
1: 3274528 Auckland 20113.82 NA
2: 3274528 Hamilton 20113.82 NA
3: 3274528 Rotorua 20113.82 NA
现在,在完整的数据中,这是奥克兰,罗托鲁瓦和汉密尔顿的分布 -
> z[WhereStayed %in% c('Rotorua', 'Hamilton', 'Auckland') & !is.na(Nights), .(Nights = sum(Nights)), by = WhereStayed]
WhereStayed Nights
1: Auckland 5019.240
2: Hamilton 1502.824
3: Rotorua 3271.130
大约51.25%
奥克兰,15.35%
汉密尔顿和33.4%
罗托鲁瓦。使用这些份额,我希望以该比率分发20113.82
,并将其分配给受访者NA
的三个3274528
。
因此,NA
插补后的数据看起来像是5397.309 = 26.8% * 20113.82
-
> z[SurveyResponseID == 3274528]
SurveyResponseID WhereStayed LoS Nights
1: 3274528 Auckland 20113.82 10308.802
2: 3274528 Hamilton 20113.82 3086.585
3: 3274528 Rotorua 20113.82 6718.435
我确实有一个涉及中间数据表的解决方案,然后加入回z
data.table
,但我不确定它是data.table
{{} 1}}做事的方式。
以下是我的长期方法,但很笨重。
ratios <- z[!is.na(Nights), .(Ratio = sum(Nights)), by = .(WhereStayed)]
ratios[, Ratio:=Ratio/sum(Ratio)]
z <- ratios[z, on = 'WhereStayed']
z[, Ratio:=Ratio/sum(Ratio), by = .(SurveyResponseID)]
z[is.na(Nights), Nights:=LoS*Ratio]
这具有以下预期输出(仅显示is.na(Nights)
) -
SurveyResponseID WhereStayed LoS Nights
1: 3274503 Matakohe 6865.8103 NA
2: 3274528 Auckland 20113.8224 10308.802
3: 3274528 Hamilton 20113.8224 3086.585
4: 3274528 Rotorua 20113.8224 6718.435
5: 3274583 Auckland 11712.8500 11712.850
6: 3274607 Rakino Island 1161.6147 NA
7: 3274715 Port Levy 2312.9432 NA
8: 3274738 Waiheke Island 3036.9614 NA
9: 3274752 Auckland 718.4200 718.420
10: 3274752 Kumeu 718.4200 0.000
11: 3274899 Auckland 96724.3395 96724.339
12: 3275082 Orewa 2125.8577 NA
13: 3275238 Auckland 4904.1634 4904.163
14: 3275256 Kumeu 5607.1564 NaN
15: 3275309 Auckland 4319.0176 4319.018
16: 3275319 Auckland 8634.8011 8634.801
17: 3275525 Auckland 25661.6887 25661.689
18: 3275560 Waiheke Island 915.7693 NA
19: 3275560 Auckland 915.7693 NA
20: 3275696 Clevedon 2871.3504 2871.350
Nights
中仍然存在的缺失是可以的,因为在这些情况下,z
中没有可以提取的数据。
z <- structure(list(SurveyResponseID = c(3274455L, 3274476L, 3274493L,
3274503L, 3274506L, 3274510L, 3274517L, 3274518L, 3274523L, 3274526L,
3274528L, 3274528L, 3274528L, 3274532L, 3274583L, 3274594L, 3274605L,
3274607L, 3274629L, 3274645L, 3274655L, 3274659L, 3274679L, 3274679L,
3274692L, 3274694L, 3274700L, 3274709L, 3274715L, 3274719L, 3274726L,
3274738L, 3274750L, 3274752L, 3274752L, 3274764L, 3274771L, 3274771L,
3274789L, 3274800L, 3274838L, 3274839L, 3274843L, 3274866L, 3274866L,
3274874L, 3274880L, 3274880L, 3274894L, 3274899L, 3274912L, 3274918L,
3274923L, 3274947L, 3274966L, 3274971L, 3274979L, 3274980L, 3275003L,
3275019L, 3275046L, 3275050L, 3275052L, 3275057L, 3275064L, 3275072L,
3275075L, 3275079L, 3275082L, 3275085L, 3275101L, 3275102L, 3275103L,
3275108L, 3275128L, 3275129L, 3275150L, 3275152L, 3275160L, 3275166L,
3275170L, 3275170L, 3275174L, 3275174L, 3275210L, 3275230L, 3275238L,
3275240L, 3275246L, 3275256L, 3275280L, 3275288L, 3275292L, 3275294L,
3275295L, 3275304L, 3275309L, 3275319L, 3275330L, 3275344L, 3275362L,
3275378L, 3275379L, 3275394L, 3275399L, 3275406L, 3275409L, 3275411L,
3275411L, 3275418L, 3275436L, 3275443L, 3275454L, 3275463L, 3275465L,
3275470L, 3275496L, 3275498L, 3275504L, 3275510L, 3275521L, 3275525L,
3275538L, 3275544L, 3275545L, 3275546L, 3275554L, 3275555L, 3275555L,
3275556L, 3275556L, 3275556L, 3275560L, 3275560L, 3275563L, 3275566L,
3275569L, 3275581L, 3275604L, 3275606L, 3275626L, 3275638L, 3275683L,
3275691L, 3275692L, 3275696L, 3275707L, 3275708L, 3275715L, 3275728L
), WhereStayed = c("Wellington", "Raglan", "Auckland", "Matakohe",
"Auckland", "Christchurch", "Auckland", "Milton", "Dannevirke",
"Auckland", "Auckland", "Hamilton", "Rotorua", "Twizel", "Auckland",
"Otaki", "Greymouth", "Rakino Island", "Houhora", "Napier", "Christchurch",
"Waipoua Forest", "Oamaru", "Dunedin", "Wellington", "Hamilton",
"Westport", "Wellington", "Port Levy", "Lake Tekapo", "Milton",
"Waiheke Island", "Paihia", "Auckland", "Kumeu", "Omarama", "Rotorua",
"Tauranga", "Timaru", "Abel Tasman National Park", "Auckland",
"Queenstown", "Warkworth", "Te Anau", "Craigieburn", "Milford Sound",
"Nelson", "Christchurch", "Rotorua", "Auckland", "New Plymouth",
"Christchurch", "Queenstown", "Kumeu", "Auckland", "Paparoa National Park",
"Waiotapu", "Whangarei", "Waitomo", "Queenstown", "Auckland",
"Queenstown", "Christchurch", "Clevedon", "Waitomo", "Christchurch",
"Taihape", "Christchurch", "Orewa", "Rotorua", "Franz Josef",
"Pukekohe", "Kumeu", "Tairua", "Taupo", "Queenstown", "Omarama",
"Auckland", "Hanmer Springs", "Rotorua", "Murchison", "Queenstown",
"Queenstown", "Milford Sound", "Auckland", "Paparoa National Park",
"Auckland", "Cromwell", "Queenstown", "Kumeu", "Clevedon", "Wellington",
"Oamaru", "Queenstown", "Endeavour Inlet", "Blenheim", "Auckland",
"Auckland", "Wellington", "Wanaka", "Masterton", "Whakapapa Village",
"Tairua", "Rotorua", "Cape Kidnappers", "Waihua", "Arrowtown",
"Cape Reinga", "Snells Beach", "Auckland", "Wellington", "Dunedin",
"Auckland", "Taupo", "Abel Tasman National Park", "Dunedin",
"Te Anau", "Christchurch", "Paihia", "Dunedin", "Hamilton", "Auckland",
"Matamata", "Wanaka", "Catlins", "Paihia", "Franz Josef", "Taupo",
"Kaikoura", "Westport", "Heaphy Track", "Piha", "Waiheke Island",
"Auckland", "Wellington", "Whangamata", "Wanaka", "Westport",
"Fiordland National Park", "Taupo", "Christchurch", "Te Anau",
"Wellington", "Rotorua", "Marlborough", "Clevedon", "Hastings",
"Stratford", "Waitomo", "Cape Reinga"), LoS = c(42741.9436047755,
39591.9555163287, 877.08616280446, 6865.81028982635, 81982.5016525796,
41375.3053535933, 4949.00343037598, 13643.8378966971, 1818.04165680688,
7911.06178019024, 20113.8223823246, 20113.8223823246, 20113.8223823246,
4297.21264743424, 11712.8500000521, 14342.9323259751, 1046.42962365774,
1161.61465947518, 26684.8013647668, 2159.85594913809, 12382.5291370991,
3572.88522911463, 3267.58643173956, 3267.58643173956, 9055.02741317069,
42964.024708285, 62527.1602217821, 799.215837399333, 2312.9432017275,
17807.880584828, 3684.55279910826, 3036.96143529467, 2095.19366998327,
718.419976697589, 718.419976697589, 1299.69196347729, 56914.2840041613,
56914.2840041613, 13328.4852202518, 5404.91247034716, 2522.48422126056,
6165.64136973517, 9531.97012687062, 3894.39120716227, 3894.39120716227,
2543.46846269262, 3414.14874750348, 3414.14874750348, 3771.30561388102,
96724.3394654342, 3583.27705777555, 3041.13854297752, 3368.50460565427,
3158.18811352136, 3904.66470252172, 5862.90633463616, 2882.83911001206,
11805.2297665087, 6402.08709024943, 5186.94312706125, 870.69199642505,
10091.1420543283, 8369.774757932, 7985.40888579288, 6926.3302645866,
4420.06917925033, 1726.86768006798, 3974.48164722869, 2125.85771144444,
4736.76735216895, 14504.7530311797, 62467.3075924298, 632.428436718402,
6645.29389114695, 2241.80914051178, 1003.1560691685, 3134.88061131533,
3604.1357395957, 48790.3266929933, 2098.82030322716, 3945.49519922237,
3945.49519922237, 2136.34311305016, 2136.34311305016, 456.440663951212,
10692.5752772267, 4904.16336515106, 10440.7991489425, 8828.17020986572,
5607.15637428966, 4374.48421791468, 23277.4964101353, 3380.0999904256,
1255.85228651154, 12561.9210632003, 7779.33569261148, 4319.01757077778,
8634.80105492512, 12844.081196906, 3666.71285119098, 4176.94496342972,
3288.20886332444, 2937.47178044397, 10205.4005090231, 19213.3721518298,
8527.86375947078, 10195.2603554514, 3735.66582375512, 3735.66582375512,
946.998025480878, 5279.64787567089, 10608.0756829274, 6242.27906140245,
5455.41709954626, 1779.0727991838, 6029.46747996311, 4385.52398444791,
14686.4890994835, 4171.39583798557, 2475.27432897754, 3005.64728199526,
25661.6887253572, 11185.9596078473, 3539.88530105119, 13857.1961646826,
3799.52953818341, 4053.93637885706, 3771.87058713216, 3771.87058713216,
26410.8270985288, 26410.8270985288, 26410.8270985288, 915.769260388995,
915.769260388995, 3294.46869510517, 4859.6269254318, 1968.91705023579,
547.139652678248, 4224.21312757923, 11692.2356812747, 712.516366875341,
9217.08214243521, 1265.12928478973, 5665.77537103692, 14824.4623882922,
2871.35038838803, 748.810764275115, 23785.4768813912, 1600.38293737054,
11787.2847424015), Nights = c(0, 0, 877.08616280446, NA, 0, 40170.1993724207,
0, 2842.46622847856, 303.006942801147, 1483.32408378567, NA,
NA, NA, 0, NA, 0, 74.7449731184097, NA, 0, 479.967988697354,
0, 0, 136.149434655815, 136.149434655815, 0, 0, 0, 72.6559852181211,
NA, 0, 0, NA, 0, NA, NA, 99.9763048828681, 503.666230125321,
503.666230125321, 416.515163132868, 0, 360.354888751508, 362.68478645501,
0, 0, 0, 0, 512.122312125522, 1877.78181112691, 179.585981613382,
NA, 275.636696751966, 0, 748.556579034282, 0, 433.851633613524,
279.186015935055, 0, 380.813863435765, 0, 357.720215659397, 870.69199642505,
630.696378395519, 697.481229827667, 0, 0, 1262.87690835724, 90.8877726351567,
722.633026768853, NA, 338.340525154925, 0, 54548.916489164, 0,
4651.70572380286, 104.270192581943, 154.331702949, 174.160033961963,
300.344644966308, 0, 0, 219.194177734576, 219.194177734576, 0,
0, 0, 0, NA, 0, 8828.17020986572, NA, 4374.48421791468, 705.37867909501,
160.957142401219, 179.407469501649, 0, 0, NA, NA, 856.272079793735,
0, 219.839208601564, 102.756526978889, 267.04288913127, 833.093919103928,
0, 275.092379337767, 0, 0, 81.2101266033722, 0, 310.567522098288,
1811.13487269492, 693.58656237805, 363.694473303084, 0, 415.825343445732,
230.817051813048, 0, 641.753205843934, 190.405717613657, 1502.82364099763,
NA, 0, 307.816113134886, 6928.59808234131, 0, 0, 377.187058713216,
377.187058713216, 614.205281361134, 1228.41056272227, 0, NA,
NA, 3294.46869510517, 0, 0, 39.081403762732, 0, 6820.47081407691,
712.516366875341, 801.485403690018, 0, 1416.44384275923, 658.864995035208,
NA, 561.608073206336, 0, 0, 0)), row.names = c(NA, -150L), class = c("data.table",
"data.frame"), index = structure(integer(0), "`__SurveyResponseID`" = integer(0)))
答案 0 :(得分:3)
我会使用WhereStayed
之和创建参考数据,然后在计算新值时运行连接,例如
## reference table with the sums
ref <- z[!is.na(Nights), .(Nights = sum(Nights)), by = WhereStayed]
## join z with ref
z[is.na(Nights), # join only where `Nights` are NAs
Nights := ref[.SD, Nights / sum(Nights) * LoS, # Calculate the formula per join
on = .(WhereStayed)], # join condition
by = SurveyResponseID] # run this by `SurveyResponseID`
## Validation
z[SurveyResponseID == 3274528]
# SurveyResponseID WhereStayed LoS Nights
# 1: 3274528 Auckland 20113.82 10308.802
# 2: 3274528 Hamilton 20113.82 3086.585
# 3: 3274528 Rotorua 20113.82 6718.435