我正在使用嵌套循环函数的摘要统计信息。我有以下代码。我要做的是保存列&#34; Region.in.sector&#34;和#34; Major.sectors&#34;对于每个摘要统计。但是,使用以下#tmp.summary[7:8] <- ratios[rows.2.consider, 1:2]
会中断循环并返回错误。
因此,我尝试创建文件output
数据框,但保存每个摘要统计信息的列。
m <- unique(ratios$Region.in.country.id)
k <- unique(ratios$Major.sectors.id)
使用以下嵌套for循环
output <- data.frame()
for(i in 1:length(m)){
country.ID <- m[i] # provided ID_2 corresponds to countries
for(j in 1:length(k)){
sector.ID <- k[j] # provided ID_1 corresponds to sectors
S1 <- which(ratios$Region.in.country.id == country.ID)
S2 <- which(ratios$Major.sectors.id == sector.ID)
rows.2.consider <- intersect(S1, S2)
tmp.summary <- summary(ratios[rows.2.consider, 4:6])
#tmp.summary[7:8] <- ratios[rows.2.consider, 1:2]
#tmp.summary <- data.frame(t(tmp.summary))
output <- rbind(output, tmp.summary)
print(tmp.summary)
rm(sector.ID, S1, S2, rows.2.consider, j)
}
rm(country.ID, i)
}
示例数据:
ratios <- structure(list(IDVar = 1:40, Major.sectors = structure(c(5L,
9L, 3L, 15L, 11L, 7L, 18L, 18L, 18L, 3L, 3L, 3L, 3L, 17L, 3L,
11L, 7L, 17L, 3L, 11L, 3L, 18L, 3L, 17L, 9L, 18L, 9L, 19L, 3L,
11L, 11L, 2L, 5L, 3L, 18L, 17L, 4L, 2L, 3L, 3L), .Label = c("Banks",
"Chemicals, rubber, plastics, non-metallic products", "Construction",
"Education, Health", "Food, beverages, tobacco", "Gas, Water, Electricity",
"Hotels & restaurants", "Insurance companies", "Machinery, equipment, furniture, recycling",
"Metals & metal products", "Other services", "Post & telecommunications",
"Primary sector", "Public administration & defense", "Publishing, printing",
"Textiles, wearing apparel, leather", "Transport", "Wholesale & retail trade",
"Wood, cork, paper"), class = "factor"), Region.in.country = structure(c(15L,
8L, 8L, 8L, 10L, 15L, 19L, 10L, 8L, 10L, 3L, 18L, 4L, 12L, 4L,
15L, 13L, 4L, 15L, 15L, 7L, 15L, 12L, 1L, 7L, 10L, 15L, 8L, 13L,
15L, 12L, 8L, 7L, 15L, 15L, 10L, 8L, 10L, 10L, 15L), .Label = c("Andalucia",
"Aragon", "Asturias", "Canary Islands", "Cantabria", "Castilla-La Mancha",
"Castilla y Leon", "Cataluna", "Ceuta", "Comunidad Valenciana",
"Extremadura", "Galicia", "Islas Baleares", "La Rioja", "Madrid",
"Melilla", "Murcia", "Navarra", "Pais Vasco"), class = "factor"),
EBIT.TA = c(-0.234432635519391, -0.884337466274593, -0.00446559204081373,
0.11109107677028, -0.137203773525798, -0.582114677880617,
0.0190497663203189, -3.04252763094666, 0.113157822682219,
-0.0255533180037229, 0.281767142199724, 0.0326641697396841,
-0.00879974750993553, 0.0542074697816672, -0.112104697294392,
-0.191945591325174, -0.00380586115226597, -0.0363239884169068,
-0.273949107908537, 0.435398668004486, -0.00563436099927988,
-2.75971618056051, -0.1047327709263, 0.151283793741506, -0.0373197549569126,
0.00912639083178201, -0.0386627754065697, -0.018235399636112,
-0.0118104711362467, -0.701299939137125, NA, 0.0191819361175666,
-0.0104887983706721, -0.801677105519484, -0.402194475974272,
-0.124125227730062, 0.143020458476649, -0.601186271451194,
0.0163269364787831, 5.09955167591238), EBIT.TA_l1 = c(-0.443687074746458,
-0.561864166134075, -0.0345769510044604, 0.0282541797531804,
-0.0181173929170762, 0.0147211350970115, 0.0588534950162799,
-1.14097109926961, 0.060100343733096, -0.0386426338471025,
0.049684095221329, 0.0558174150334904, 0.00214962169435867,
0.0399960114646072, 0.0402934579830171, -0.612359147433149,
-0.0115916125659674, 0.00739473610413031, 0.0174576615247567,
0.68624861825246, 0.0305807338940829, -3.88006243913616,
0.0410122725022661, -0.089491343996377, -0.215219123182103,
0.00967853324842811, -0.0336715197882038, 0.362424791356667,
0.221203934329637, -0.654387857513823, 0.0656934439915892,
0.0652005453654772, 0.0339559014267185, 0.0259085077216708,
-0.303606048856146, 0.0280113794301873, 0.109307291990628,
-0.470048555841697, -0.00157699300508027, -0.350519090107081
), EBIT.TA_l2 = c(-0.351308186716873, 0.00159428805074234,
-0.00604587147802615, 0.0761894448922952, -0.00348378141492824,
NA, 0.0346370866793768, -0.552226781084599, 0.00220031803369861,
-0.0285840972149053, 0.065316579236306, 0.4090851643341,
-0.0188362202518351, 0.0403848986306371, 0.091146090480032,
-0.0154168449752466, -0.0694803621032671, 0.0511978643139393,
-0.452924037757731, -0.0091835704914724, 0.0119918914092344,
0.0858960833880717, NA, 0.104901526886479, -0.23096183545392,
-0.0163058345980967, 0.100643431561465, 0.0527859573541712,
0.250207316117438, NA, 0.00193240515291123, 0.0624210741756767,
0.0178136227732972, -0.0321294913646274, -0.0699629484084657,
-0.00417176180400133, 0.209612573099415, 0.0285645570852926,
0.0551624216079071, 0.0172738293439595), Major.sectors.id = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 8L, 3L, 5L,
6L, 8L, 3L, 5L, 3L, 7L, 3L, 8L, 2L, 7L, 2L, 9L, 3L, 5L, 5L,
10L, 1L, 3L, 7L, 8L, 11L, 10L, 3L, 3L), Region.in.country.id = c(1L,
2L, 2L, 2L, 3L, 1L, 4L, 3L, 2L, 3L, 5L, 6L, 7L, 8L, 7L, 1L,
9L, 7L, 1L, 1L, 10L, 1L, 8L, 11L, 10L, 3L, 1L, 2L, 9L, 1L,
8L, 2L, 10L, 1L, 1L, 3L, 2L, 3L, 3L, 1L)), .Names = c("IDVar",
"Major.sectors", "Region.in.country", "EBIT.TA", "EBIT.TA_l1",
"EBIT.TA_l2", "Major.sectors.id", "Region.in.country.id"), row.names = c(NA,
40L), class = "data.frame")
原始输出类似于以下内容;
Var1 Var2 Freq
1 EBIT.TA Min. :-0.2344
2 EBIT.TA 1st Qu.:-0.2344
3 EBIT.TA Median :-0.2344
4 EBIT.TA Mean :-0.2344
5 EBIT.TA 3rd Qu.:-0.2344
6 EBIT.TA Max. :-0.2344
我试图以像;
这样的形式得到它 Var1 Var2 Freq Region.in.country Major.sectors
1 EBIT.TA Min. :-0.2344 Madrid Publishing
2 EBIT.TA 1st Qu.:-0.2344 Madrid Publishing
3 EBIT.TA Median :-0.2344 Madrid Publishing
4 EBIT.TA Mean :-0.2344 Madrid Publishing
5 EBIT.TA 3rd Qu.:-0.2344 Madrid Publishing
6 EBIT.TA Max. :-0.2344 Madrid Publishing
如果您知道为我的问题计算摘要统计信息的更优雅方式,请告诉我们。我想保持嵌套for循环功能,因为将来我想&#34;输入&#34;除了&#34;摘要统计之外的其他事项&#34;例如,在那里或其他一些计算中粘贴回归模型。
答案 0 :(得分:0)
你可以这样做:
library(tidyverse)
library(tidytext)
ratios%>%mutate(Major.sectors=as.character(Major.sectors))%>%
unnest_tokens(word,Major.sectors)