在r中合并范围组的最有效方法?

时间:2018-07-05 21:25:04

标签: r dplyr tidyverse

   Age_Group Region  Population
   <fct>     <fct>        <int>
1 0-4       ENGLAND    3384925
2 5-9       ENGLAND    3497402
3 10-14     ENGLAND    3166038
4 15-19     ENGLAND    3120730
5 20-24     ENGLAND    3526141
6 25-29     ENGLAND    3831624
7 30-34     ENGLAND    3757400
8 35-39     ENGLAND    3642643
9 40-44     ENGLAND    3442758
10 45-49     ENGLAND    3850108

嘿,您有最有效的方法来按不同的时间间隔合并年龄组,例如5岁或10岁,以形成以下列表。

       Age_Group Region  Population
   <fct>     <fct>        <int>
1 0-9       ENGLAND    xxx
2 10-19     ENGLAND    xxx
3 20-29     ENGLAND    xxx
...

1 个答案:

答案 0 :(得分:2)

这里有<body onload="loadXMLDoc()"> <input list="myInput" id="myInputId" value=""> <button id="myButton" onClick="loadXMLDoc()">submit</button> <p>input value: <span id="inputValue"></span></p> <p>XML tree node position of input value: <span id="nodePosition"></span></p> <p>State: <span id="state"></span></p> <p>GDP: <span id="gdp"></span></p> <p>Population: <span id="population"></span></p> <datalist id="myInput"> <option id="AL">Alabama</option> <option id="CA">California</option> <option id="MI">Michigan</option> <option id="TX">Texas</option> <option id="WI">Wisconsin</option> </datalist>的可能性

<?xml version="1.0" encoding="UTF-8"?>
<STATE_DATA>
 <UNIT>
    <STATE>Wisconsin</STATE>
    <GDP>232,300,000,000</GDP>
    <POPULATION>5,800,000</POPULATION>
 </UNIT>
 <UNIT>
    <STATE>Alabama</STATE>
    <GDP>165,800,000,000</GDP>
    <POPULATION>4,900,000</POPULATION>
 </UNIT>
 <UNIT>
    <STATE>California</STATE>   
    <!-- Note: the GDP node for this unit is missing -->
    <POPULATION>39,600,000</POPULATION>
 </UNIT>
 <UNIT>
    <STATE>Texas</STATE>
    <GDP>1,600,000,000,000</GDP>
    <POPULATION>28,300,000</POPULATION>
 </UNIT>
 <UNIT>
    <STATE>Michigan</STATE>
    <GDP>382,000,000</GDP>
    <POPULATION>10,000,000</POPULATION>
 </UNIT>
</STATE_DATA>

说明:如@DavidArenburg所述,我们按每两行对条目进行分组,通过组合每两行的tidyverse个条目来创建新的library(tidyverse) df %>% mutate(grp = rep(1:(nrow(.)/2), each = 2)) %>% group_by(grp) %>% mutate( Age_Group = paste(Age_Group, collapse = ":"), Age_Group = gsub("-\\d+:\\d+", "", Age_Group)) %>% mutate(Population = sum(Population)) %>% slice(1) %>% ungroup() %>% select(-grp) ## A tibble: 5 x 3 # Age_Group Region Population # <chr> <fct> <int> #1 0-9 ENGLAND 6882327 #2 10-19 ENGLAND 6286768 #3 20-29 ENGLAND 7357765 #4 30-39 ENGLAND 7400043 #5 40-49 ENGLAND 7292866 标签,然后汇总Age_Group个条目。大多数工作来自创建新的Age_Group标签。