国会票据进入R数据表

时间:2014-10-24 18:54:33

标签: xml r

我已经阅读了很多文章,其中一些来自stackoverflow,其中有关如何将数据从xml文件提取到R中的数据表的示例。但是我的尝试失败了,也许是因为我的xml文件?我发布了一个示例xml文件,如果有人可以看一看并指出我将这些文件放到表中的方向,那将是最有帮助的。

 ' <bill session="113" type="s" number="12" updated="2014-09-20T07:17:52-04:00">
  <state datetime="2013-02-26">REFERRED</state>
  <status>
    <introduced datetime="2013-02-26"/>
  </status>
  <introduced datetime="2013-02-26"/>
  <titles>
    <title type="short" as="introduced">Naval Vessel Transfer Act of 2013</title>
    <title type="official" as="introduced">A bill to provide for the transfer of naval vessels to      certain foreign recipients.</title>
  </titles>
  <sponsor id="402675"/>
  <cosponsors>
    <cosponsor id="412491" joined="2013-11-05"/>
  </cosponsors>
  <actions>
    <action datetime="2013-02-26" state="REFERRED">
      <text>Read twice and referred to the Committee on Foreign Relations.</text>
    </action>
  </actions>
  <committees>
    <committee code="SSFR" name="Senate Foreign Relations" activity="Referral, In Committee"/>
  </committees>
  <relatedbills>
    <bill relation="unknown" session="113" type="s" number="1683"/>
  </relatedbills>
  <subjects>
    <term name="International affairs"/>
    <term name="Asia"/>
    <term name="Buy American requirements"/>
    <term name="Latin America"/>
    <term name="Marine and inland water transportation"/>
    <term name="Mexico"/>
    <term name="Military assistance, sales, and agreements"/>
    <term name="Military facilities and property"/>
    <term name="Taiwan"/>
    <term name="Thailand"/>
  </subjects>
  <amendments/>
  <summary>2/26/2013--Introduced.
 Naval Vessels Transfer Act of 2013 - Authorizes the President to transfer on a grant basis to: (1)         Mexico, the OLIVER HAZARD PERRY class guided missile frigates CURTS and MCCLUSKY; and (2) Thailand, the         OLIVER HAZARD PERRY class guided missile frigates RENTZ and VANDEGRIFT.

Authorizes the President to transfer on a sale basis the OLIVER HAZARD PERRY class guided missile     frigates TAYLOR, GARY, CARR, and ELROD to the Taipei Economic and Cultural Representative Office of the     United States (which is the Taiwan instrumentality designated pursuant to the Taiwan Relations Act).

States that: (1) the value of such vessels transferred on a grant basis shall not be counted against  the aggregate value of excess defense articles transferred to countries in any fiscal year under the  Foreign Assistance Act of 1961; (2) transfer costs shall be charged to the recipient; and (3) to the  maximum extent practicable, the country to which a vessel is transferred shall have necessary vessel  repair and refurbishment carried out at U.S. shipyards (including U.S. Navy shipyards).

Terminates transfer authority three years after enactment of this Act.</summary>
</bill> '

1 个答案:

答案 0 :(得分:1)

您可以尝试将XML拆分为单独的账单(避免相关账单),然后使用xpath查询来选择使用lapply或循环所需的任何列。

doc <- xmlParse("lotsofbills.xml")
nodes <- getNodeSet(doc, "//bill[not(ancestor::bill)]")

 x <- lapply(nodes, function(x){ data.frame(
   bill_session = xpathSApply(x, ".", xmlGetAttr, "session"),
    short_title = xpathSApply(x, ".//title[@type='short']", xmlValue),
action_datetime = xpathSApply(x, ".//actions/action", xmlGetAttr, "datetime"),
         action = xpathSApply(x, ".//actions/action/text", xmlValue),
       subjects = paste( xpathSApply(x, ".//subjects/term", xmlGetAttr, "name"), collapse="; ")
)})

do.call("rbind", x)
  bill_session                       short_title action_datetime                                                         action
1          113 Naval Vessel Transfer Act of 2013      2013-02-26 Read twice and referred to the Committee on Foreign Relations.
                                                                                                                                                                                                               subjects
1 International affairs; Asia; Buy American requirements; Latin America; Marine and inland water transportation; Mexico; Military assistance, sales, and agreements; Military facilities and property; Taiwan; Thailand

为了比较,这是一个循环,如果您不熟悉xml文件,这可能更容易使用

x<-vector("list", length(nodes))

for (i in 1:length(nodes)){
subDoc <- xmlDoc(nodes[[i]])
   bill_session <- xpathSApply(subDoc, "/bill", xmlGetAttr, "session")
    short_title <- xpathSApply(subDoc, "//title[@type='short']", xmlValue)
action_datetime <- xpathSApply(subDoc, "//actions/action", xmlGetAttr, "datetime")
         action <- xpathSApply(subDoc, "//actions/action/text", xmlValue)
       subjects <- paste( xpathSApply(subDoc, "//subjects/term", xmlGetAttr, "name"), collapse="; ")
 x[[i]] <- data.frame(bill_session, short_title, action_datetime, action, subjects)
free(subDoc)
}