从一堆XML文件中获取信息

时间:2015-12-01 17:18:56

标签: xml r

我正在使用Studio,但我甚至不知道从哪里开始。

如果我在一个文件夹中有一堆具有此结构的XML文件:

<?xml version="1.0" encoding="UTF-8"?>

<rootTag>
  <Award>
    <AwardTitle>Strain-Induced Ordering, Assembly, and Defects in Nanoscale Thin-Film Heterostructures</AwardTitle>
    <AwardEffectiveDate>05/01/2013</AwardEffectiveDate>
    <AwardExpirationDate>04/30/2016</AwardExpirationDate>
    <AwardAmount>268000</AwardAmount>
    <AwardInstrument>
      <Value>Standard Grant</Value>
    </AwardInstrument>
    <Organization>
      <Code>07030000</Code>
      <Directorate>
        <LongName>Directorate For Engineering</LongName>
      </Directorate>
      <Division>
        <LongName>Div Of Civil, Mechanical, &amp; Manufact Inn</LongName>
      </Division>
    </Organization>
    <ProgramOfficer>
      <SignBlockName>Kara Peters</SignBlockName>
    </ProgramOfficer>
    <AbstractNarration>The research objective of this award is to identify and investigate fundamental modeling issues in a simultaneous thin film growth and self-assembly process, in which phase separation as well as self-assembly of nanoscale heterostructure occur simultaneously during the growth of thin film. A successful fabrication process for ordered arrays of nanostructures in multi-phase thin films would find applications in a wide variety of material systems such as superconductors and multiferroics. Key hypotheses underpinning the fundamental behavior will be investigated including (1) vertical/lateral ordering and self-assembly is strain mediated; (2) control of these nanostructures relies on the competition between several kinetic processes; and (3) resulting functional properties depend on defects in thin film heterostructures. The proposed work will develop a theoretical model and phase-field simulations that consider the evolution of composition (phase separation), morphology (film growth), defects, and polarization (domain switching). The predicative capabilities will be tested by close collaborations with experimental groups. &lt;br/&gt;&lt;br/&gt;The grand challenge arising from the synthesis and applications of multi-functional materials is how to achieve patterned and ordered nanostructures with one or multiple length scales that are commensurate with the designed functional properties. As a critical step towards this direction, this program helps establish a centralized platform for focused research and education activities in the fields of novel materials processing, reliability, and multifunctional properties at small scales, which are beyond the current curricula and educational means in materials science and applied mechanics. The educational plan focuses on new curriculum development in nanomaterials and active participation in the UT Math and Science Center (targeting high school students from rural counties in East Tennessee) and the UT Pre-Collegiate Research Scholars program (targeting high school students in Knox County). These activities will have a particular emphasis on educating students from underrepresented groups.</AbstractNarration>
    <MinAmdLetterDate>02/12/2013</MinAmdLetterDate>
    <MaxAmdLetterDate>02/12/2013</MaxAmdLetterDate>
    <ARRAAmount/>
    <AwardID>1300223</AwardID>
    <Investigator>
      <FirstName>Yanfei</FirstName>
      <LastName>Gao</LastName>
      <EmailAddress>ygao7@utk.edu</EmailAddress>
      <StartDate>02/12/2013</StartDate>
      <EndDate/>
      <RoleCode>Principal Investigator</RoleCode>
    </Investigator>
    <Institution>
      <Name>University of Tennessee Knoxville</Name>
      <CityName>KNOXVILLE</CityName>
      <ZipCode>379960003</ZipCode>
      <PhoneNumber>8659743466</PhoneNumber>
      <StreetAddress>1 CIRCLE PARK</StreetAddress>
      <CountryName>United States</CountryName>
      <StateName>Tennessee</StateName>
      <StateCode>TN</StateCode>
    </Institution>
    <ProgramElement>
      <Code>1630</Code>
      <Text>Mechanics of Materials and Str</Text>
    </ProgramElement>
    <ProgramReference>
      <Code>022E</Code>
      <Text>SOLID MECHANICS</Text>
    </ProgramReference>
    <ProgramReference>
      <Code>024E</Code>
      <Text>MATERIALS DESIGN</Text>
    </ProgramReference>
    <ProgramReference>
      <Code>9150</Code>
      <Text>EXP PROG TO STIM COMP RES</Text>
    </ProgramReference>
    <ProgramReference>
      <Code>9161</Code>
      <Text>SINGLE DIVISION/UNIVERSITY</Text>
    </ProgramReference>
    <ProgramReference>
      <Code>AMPP</Code>
      <Text>ADVANCED MATERIALS &amp; PROCESSING PROGRAM</Text>
    </ProgramReference>
  </Award>
</rootTag>

如何找到资金最多的前10所大学? 那会是和组织代码? (我想)?

1 个答案:

答案 0 :(得分:0)

为每个文件做点什么

library("xml2")
xml <- xml2::read_xml("~/file.xml")
toget <- c('AwardID', 'AwardAmount')
out <- lapply(toget, function(z) {
  xml_text(xml_find_one(xml, paste0("//", z)))
})
setNames(out, toget)

$AwardID
[1] "1300223"

$AwardAmount
[1] "268000"

然后将它们放在一起,继续进行计算