R XML - 将父节点和子节点组合到数据框

时间:2018-01-13 22:41:32

标签: r xml xpath

我有这样的xml:

<root>
<cards>
<meeting name="Punchestown (IRE)" id="195" diffusion_course_name="PUNCHESTOWN">
      <race id="692415" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>12:25</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Adare Manor Opportunity Handicap Chase</title>
        <type>C</type>
        <distance>2m4f</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>10</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Handicap Chase</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692416" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:00</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Total Event Rental (Kildare) Novice Chase (Grade 3)</title>
        <type>C</type>
        <distance>2m4f</distance>
        <group>Grade 3</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Novice Chase Grade 3</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692417" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:35</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Connolly's RED MILLS Amateur National (Q.R.) Handicap Chase</title>
        <type>C</type>
        <distance>3m1f</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>12</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Handicap Chase</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692418" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>2:10</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Sky Bet Moscow Flyer Novice Hurdle (Grade 2)</title>
        <type>H</type>
        <distance>2m</distance>
        <group>Grade 2</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Novice Hurdle Grade 2</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692419" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>2:45</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Sportinglife.com Maiden Hurdle</title>
        <type>H</type>
        <distance>2m</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>17</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Maiden Hurdle</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692420" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:20</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Leinster Leader Mares Handicap Hurdle</title>
        <type>H</type>
        <distance>2m4f40y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>8</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Handicap Hurdle</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692421" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:50</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>David Trundley Artist At Punchestown Irish Stallion Farms EBF Mares Flat Race</title>
        <type>B</type>
        <distance>2m</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>14</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>NHF</raceDescription>
        <tvText>ATR </tvText>
      </race>
    </meeting>
    <meeting name="Warwick" id="85" diffusion_course_name="WARWICK">
      <race id="691061" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>12:40</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betfred Supports Jack Berry House Novices' Handicap Hurdle</title>
        <type>H</type>
        <distance>2m</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>18</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Novice Handicap Hurdle</raceDescription>
        <tvText>RUK </tvText>
        <betOffers>
          <betOffer>WH</betOffer>
        </betOffers>
      </race>
      <race id="691060" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betfred Mobile Edward Courage Cup Handicap Chase</title>
        <type>C</type>
        <distance>2m54y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 3 Handicap Chase</raceDescription>
        <tvText>RUK </tvText>
        <betOffers>
          <betOffer>LB</betOffer>
          <betOffer>WH</betOffer>
        </betOffers>
      </race>
      <race id="691058" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:50</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betfred Home Of Goals Galore Hampton Novices' Chase (Listed Race)</title>
        <type>C</type>
        <distance>3m</distance>
        <group>Listed</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>5</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 1 Novice Chase Listed</raceDescription>
        <tvText>ITV4 </tvText>
        <betOffers>
          <betOffer>Coral</betOffer>
        </betOffers>
      </race>
      <race id="691059" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>2:25</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Pertemps Network Handicap Hurdle (Series Qualifier)</title>
        <type>H</type>
        <distance>3m1f</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>12</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 2 Handicap Hurdle</raceDescription>
        <tvText>ITV4 </tvText>
        <betOffers>
          <betOffer>LB</betOffer>
          <betOffer>WH</betOffer>
          <betOffer>Coral</betOffer>
        </betOffers>
      </race>
      <race id="691057" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:00</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Ballymore Leamington Novices' Hurdle (Grade 2)</title>
        <type>H</type>
        <distance>2m5f</distance>
        <group>Grade 2</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>6</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 1 Novice Hurdle Grade 2</raceDescription>
        <tvText>ITV4 </tvText>
        <betOffers>
          <betOffer>WH</betOffer>
          <betOffer>Coral</betOffer>
        </betOffers>
      </race>
      <race id="691056" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:35</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betfred Classic Handicap Chase (Grade 3)</title>
        <type>C</type>
        <distance>3m5f54y</distance>
        <group>Grade 3 Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>15</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 1 Handicap Chase Grade 3</raceDescription>
        <tvText>ITV4 </tvText>
        <betOffers>
          <betOffer>LB</betOffer>
          <betOffer>WH</betOffer>
          <betOffer>Coral</betOffer>
          <betOffer>PP</betOffer>
        </betOffers>
      </race>
      <race id="691062" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>4:05</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betfred TV "Newcomers" Standard Open National Hunt Flat Race</title>
        <type>B</type>
        <distance>2m</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>9</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 5 NHF</raceDescription>
        <tvText>RUK </tvText>
        <betOffers>
          <betOffer>WH</betOffer>
        </betOffers>
      </race>
    </meeting>
    <meeting name="Wetherby" id="87" diffusion_course_name="WETHERBY">
      <race id="691067" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>12:30</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Racing UK Jump To It Novices' Hurdle</title>
        <type>H</type>
        <distance>2m3f154y</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>9</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Novice Hurdle</raceDescription>
        <tvText>RUK </tvText>
      </race>
      <race id="691066" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:05</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Racing UK In Stunning HD "Confined" Novices' Chase</title>
        <type>C</type>
        <distance>2m3f85y</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Novice Chase</raceDescription>
        <tvText>RUK </tvText>
      </race>
      <race id="691068" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>1:40</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Bet At racinguk.com Handicap Hurdle</title>
        <type>H</type>
        <distance>2m</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>9</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Handicap Hurdle</raceDescription>
        <tvText>RUK </tvText>
      </race>
      <race id="691063" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>2:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>totescoop6 Play Today Handicap Chase</title>
        <type>C</type>
        <distance>1m7f36y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>5</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 2 Handicap Chase</raceDescription>
        <tvText>RUK </tvText>
      </race>
      <race id="691064" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>2:50</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>totescoop6 Results On totepoolliveinfo.com Handicap Hurdle</title>
        <type>H</type>
        <distance>2m3f154y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>11</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 3 Handicap Hurdle</raceDescription>
        <tvText>RUK </tvText>
      </race>
      <race id="691065" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:25</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Book Now For Medieval Day - 3rd February Handicap Chase (Northern Lights Middle Distance Series)</title>
        <type>C</type>
        <distance>2m3f85y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Handicap Chase</raceDescription>
        <tvText>RUK </tvText>
        <betOffers>
          <betOffer>LB</betOffer>
        </betOffers>
      </race>
      <race id="691069" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>3:55</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Racing UK On Sky 432 Fillies' "Junior" Standard Open National Hunt Flat Race</title>
        <type>B</type>
        <distance>1m4f77y</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>8</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 5 NHF</raceDescription>
        <tvText>RUK </tvText>
      </race>
    </meeting>
    <meeting name="Wolverhampton (AW)" id="513" diffusion_course_name="WOLVERHAMPTON">
      <race id="691141" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>5:45</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Bet &amp; Watch At sunbets.co.uk Apprentice Handicap</title>
        <type>X</type>
        <distance>1m142y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>13</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 6 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691136" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>6:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>sunbets.co.uk Handicap</title>
        <type>X</type>
        <distance>1m142y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>9</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691140" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>6:45</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betway Live Casino Handicap</title>
        <type>X</type>
        <distance>2m120y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>13</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 6 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691138" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>7:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betway Casino Handicap (Div I)</title>
        <type>X</type>
        <distance>6f20y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>13</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 5 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="692653" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>7:45</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betway Casino Handicap (Div II)</title>
        <type>X</type>
        <distance>6f20y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>12</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 5 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691139" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>8:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>Betway Novice Stakes</title>
        <type>X</type>
        <distance>5f21y</distance>
        <group/>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>6</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 5 Novice</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691137" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>8:45</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>32Red.com Handicap</title>
        <type>X</type>
        <distance>5f21y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>7</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 4 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
      <race id="691142" perform_race_id="" perform_race_id_atr="" details_available="1" race_status_code="R">
        <time>9:15</time>
        <date>2018-01-13</date>
        <ampm>pm</ampm>
        <title>32Red Casino Handicap</title>
        <type>X</type>
        <distance>1m1f104y</distance>
        <group>Handicap</group>
        <tipsAllowed>1</tipsAllowed>
        <predictorAllowed>1</predictorAllowed>
        <bettingLink>1</bettingLink>
        <declaredRunners>11</declaredRunners>
        <liveCommentary>1</liveCommentary>
        <liveTab>1</liveTab>
        <raceDescription>Class 6 Handicap</raceDescription>
        <tvText>ATR </tvText>
      </race>
    </meeting>
  </cards>
</root>

通过运行此R代码,我可以获得我想要的数据,这实际上是竞赛数据(子节点):

CardList=cbind(
  date,
  data.frame(raceid=xpathSApply(CardList_tmp, "//meeting/race", xmlGetAttr, 'id')),
  data.frame(cards=xpathSApply(CardList_tmp, "//meeting/race", xmlGetAttr, 'details_available')),
  data.frame(status=xpathSApply(CardList_tmp, "//meeting/race", xmlGetAttr, 'race_status_code')),
  xmlToDataFrame(nodes = getNodeSet(CardList_tmp, "//meeting/race"))
  )

但是,它不包含在父属性级别保存的会议数据:

course = xpathSApply(CardList_tmp, "//meeting", xmlGetAttr, 'name')
cid = xpathSApply(CardList_tmp, "//meeting", xmlGetAttr, 'id')

有没有办法可以将两组代码组合在一起,一步提供一个数据帧?

3 个答案:

答案 0 :(得分:2)

这里有用于XML处理的xml2和用于munging的tidyverse的选项。属性(xml_attrs返回一个命名的字符向量),节点名称和节点值可以读入一个可以强制转换为数据框的三元素列表:

library(tidyverse)
library(xml2)

x <- read_xml('races.xml')

races <- x %>% 
    xml_find_all('//race') %>% 
    map_df(~list(attrs = list(xml_attrs(.x)), 
                 variable = list(map(xml_children(.x), xml_name)), 
                 value = list(map(xml_children(.x), xml_text))))

races
#> # A tibble: 29 x 3
#>    attrs     variable    value      
#>    <list>    <list>      <list>     
#>  1 <chr [5]> <list [15]> <list [15]>
#>  2 <chr [5]> <list [15]> <list [15]>
#>  3 <chr [5]> <list [15]> <list [15]>
#>  4 <chr [5]> <list [15]> <list [15]>
#>  5 <chr [5]> <list [15]> <list [15]>
#>  6 <chr [5]> <list [15]> <list [15]>
#>  7 <chr [5]> <list [15]> <list [15]>
#>  8 <chr [5]> <list [16]> <list [16]>
#>  9 <chr [5]> <list [16]> <list [16]>
#> 10 <chr [5]> <list [16]> <list [16]>
#> # ... with 19 more rows

反过来可以用很多tidyr来清理:

races_tidy <- races %>% 
    mutate(attr_names = map(attrs, names)) %>% 
    unnest(attr_names, attrs, .drop = FALSE) %>% 
    spread(attr_names, attrs) %>% 
    unnest(variable, value) %>% 
    unnest(variable, value) %>% 
    spread(variable, value) %>% 
    type_convert()    # fix variable types

这是有效的,但是不必要和传播是脆弱的。编写一个更健壮的方法实际上并没有太多工作,因为您可以在排除之前安排列表列:

races_tidy2 <- races %>% 
    mutate(attrs = map(attrs, ~as_data_frame(as.list(.x))), 
           data = map2(variable, value, ~as_data_frame(set_names(.y, .x)))) %>% 
    unnest(attrs, data, .drop = TRUE) %>% 
    type_convert()

最直接的方法是在迭代节点时进行重新排列。这是最简洁,也可能是最有效的方法,但正确编写它依赖于对数据结构的仔细操作,因此编写可行的代码可能需要更长的时间。

races_tidy3 <- x %>% 
    xml_find_all('//race') %>% 
    map_df(~flatten(c(xml_attrs(.x), 
                      map(xml_children(.x), 
                          ~set_names(as.list(xml_text(.x)), xml_name(.x)))))) %>%
    type_convert()

races_tidy3
#> # A tibble: 29 x 21
#>        id perf… perf… deta… race… time  date       ampm  title type  dist…
#>     <int> <chr> <chr> <int> <chr> <tim> <date>     <chr> <chr> <chr> <chr>
#>  1 692415 <NA>  <NA>      1 R     12:25 2018-01-13 pm    Adar… C     2m4f 
#>  2 692416 <NA>  <NA>      1 R     01:00 2018-01-13 pm    Tota… C     2m4f 
#>  3 692417 <NA>  <NA>      1 R     01:35 2018-01-13 pm    Conn… C     3m1f 
#>  4 692418 <NA>  <NA>      1 R     02:10 2018-01-13 pm    Sky … H     2m   
#>  5 692419 <NA>  <NA>      1 R     02:45 2018-01-13 pm    Spor… H     2m   
#>  6 692420 <NA>  <NA>      1 R     03:20 2018-01-13 pm    Lein… H     2m4f…
#>  7 692421 <NA>  <NA>      1 R     03:50 2018-01-13 pm    Davi… B     2m   
#>  8 691061 <NA>  <NA>      1 R     12:40 2018-01-13 pm    Betf… H     2m   
#>  9 691060 <NA>  <NA>      1 R     01:15 2018-01-13 pm    Betf… C     2m54y
#> 10 691058 <NA>  <NA>      1 R     01:50 2018-01-13 pm    Betf… C     3m   
#> # ... with 19 more rows, and 10 more variables: group <chr>, tipsAllowed
#> #   <int>, predictorAllowed <int>, bettingLink <int>, declaredRunners
#> #   <int>, liveCommentary <int>, liveTab <int>, raceDescription <chr>,
#> #   tvText <chr>, betOffers <chr>

所有返回相同的数据,但races_tidy的列顺序不同。

all_equal(races_tidy, races_tidy2)
#> [1] TRUE

identical(races_tidy2, races_tidy3)
#> [1] TRUE

答案 1 :(得分:0)

考虑按节点索引解析会议数据并将其扩展为其子种族元素的数量,然后使用种族数据进行列绑定:

doc <- xmlParse("/path/to/Source.xml")

# NUMBER OF MEETING NODES
mtg_num <- length(xpathSApply(doc, "//meeting"))

# DATAFRAME LIST OF EXPANDED MEETING ATTRS
meeting_list <- lapply(seq(mtg_num), function(i) {
  races_num <- length(xpathSApply(doc, sprintf("//meeting[%s]/race", i)))

  data.frame(
    meeting_id = rep(xpathSApply(doc, sprintf("//meeting[%s]/@id", i)), races_num),
    meeting_name = rep(xpathSApply(doc, sprintf("//meeting[%s]/@name", i)), races_num)
  )
})

# COLUMN BIND MEETING NODES, RACE NODES, AND RACE ATTRS
final_df <- cbind(do.call(rbind, meeting_list),
                  xmlToDataFrame(nodes = getNodeSet(doc, "//meeting/race")),
                  XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//meeting/race")))

输出

head(final_df)

#   meeting_id      meeting_name  time       date ampm                                                       title type distance    group tipsAllowed predictorAllowed
# 1        195 Punchestown (IRE) 12:25 2018-01-13   pm                      Adare Manor Opportunity Handicap Chase    C     2m4f Handicap           1                1
# 2        195 Punchestown (IRE)  1:00 2018-01-13   pm         Total Event Rental (Kildare) Novice Chase (Grade 3)    C     2m4f  Grade 3           1                1
# 3        195 Punchestown (IRE)  1:35 2018-01-13   pm Connolly's RED MILLS Amateur National (Q.R.) Handicap Chase    C     3m1f Handicap           1                1
# 4        195 Punchestown (IRE)  2:10 2018-01-13   pm                Sky Bet Moscow Flyer Novice Hurdle (Grade 2)    H       2m  Grade 2           1                1
# 5        195 Punchestown (IRE)  2:45 2018-01-13   pm                              Sportinglife.com Maiden Hurdle    H       2m                    1                1
# 6        195 Punchestown (IRE)  3:20 2018-01-13   pm                       Leinster Leader Mares Handicap Hurdle    H  2m4f40y Handicap           1                1

#  bettingLink declaredRunners liveCommentary liveTab       raceDescription tvText betOffers     id perform_race_id perform_race_id_atr details_available race_status_code
# 1           1              10              1       1        Handicap Chase   ATR       <NA> 692415                                                     1                R
# 2           1               7              1       1  Novice Chase Grade 3   ATR       <NA> 692416                                                     1                R
# 3           1              12              1       1        Handicap Chase   ATR       <NA> 692417                                                     1                R
# 4           1               7              1       1 Novice Hurdle Grade 2   ATR       <NA> 692418                                                     1                R
# 5           1              17              1       1         Maiden Hurdle   ATR       <NA> 692419                                                     1                R
# 6           1               8              1       1       Handicap Hurdle   ATR       <NA> 692420                                                     1                R

答案 2 :(得分:0)

或者,考虑XSLT,专门设计用于转换XML文件的专用语言,例如更平坦,更简单的文件,以满足您的R需求。 R可以使用xslt第三方程序包(xml2的扩展名)运行XSLT 1.0脚本。

但是,XSLT是可移植的,甚至可以在R之外使用Java,Python,PHP或Saxon and Xalan等专用可执行文件运行。下面显示了对xsltprocsystem调用。有一个类似的批量调用可用于Windows。简化后,使用XML xmlToDataframe传递新XML。

具体来说,下面的XSLT会解析为 race 级别,并从父节点提取会议数据。

XSLT (另存为.xsl,格式良好的.xml文件)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/root/cards">
     <xsl:copy>
       <xsl:apply-templates select="meeting"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="meeting">
       <xsl:apply-templates select="race"/>
    </xsl:template>

    <xsl:template match="race">
     <xsl:copy>
       <meeting_id><xsl:value-of select="ancestor::meeting/@id"/></meeting_id>
       <meeting_name><xsl:value-of select="ancestor::meeting/@name"/></meeting_name>
       <xsl:apply-templates select="@*"/>
       <xsl:copy-of select="*"/>
     </xsl:copy>
    </xsl:template>

    <xsl:template match="race/@*">
       <xsl:element name="{name(.)}"><xsl:value-of select="."/></xsl:element>
    </xsl:template>

</xsl:stylesheet>

<强> - [R

library(XML)
library(xslt)

# LOAD XML AND XSL
input <- read_xml("/path/to/input.xml", package = "xslt")
style <- read_xml("/path/to/xslt_script.xsl", package = "xslt")

# TRANSFORM INPUT INTO OUTPUT
new_xml <- xml_xslt(input, style)
output <- as.character(new_xml)

# PARSE OUTPUT FROM STRING
doc <- xmlParse(output, asText=TRUE)

# COMMAND LINE CALL TO UNIX'S XSLTPROC (ALTERNATIVE TO xslt PACKAGE)
system("xsltproc -o /path/to/input.xml /path/to/xslt_script.xsl /path/to/output.xml")
doc <- xmlParse("/path/to/output.xml")

# BUILD DATAFRAME
df <- xmlToDataFrame(doc, nodes=getNodeSet(doc, '//race'))