I am trying to parse some xml documents in R XML--. DataFrame. What I want to do is flatten the XML tree so that I get one row in data frame per each, child. Also I want for each row to contain data from parent
example:
<xml>
<eventlist>
<event>
<ProcessIndex>1063</ProcessIndex>
<Time_of_Day>2:54:20.2959537 PM</Time_of_Day>
<Process_Name>chrome.exe</Process_Name>
<PID>12164</PID>
<Operation>ReadFile</Operation>
<Result>SUCCESS</Result>
<Detail>Offset: 1,684,224, Length: 256</Detail>
<stack>
<frame>
<depth>0</depth>
<address>0xfffff8038683667c</address>
<path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
<location>FltDecodeParameters + 0x1a6c</location>
</frame>
<frame>
<depth>1</depth>
<address>0xfffff80386834e13</address>
<path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
<location>FltDecodeParameters + 0x203</location>
</frame>
<frame>
<depth>3</depth>
<address>0x7ffea54ffac1</address>
<path>C:\WINDOWS\SYSTEM32\ntdll.dll</path>
<location>RtlUserThreadStart + 0x21</location>
</frame>
</stack>
</event>
<event>
<ProcessIndex>1063</ProcessIndex>
<Time_of_Day>2:54:20.2960270 PM</Time_of_Day>
<Process_Name>chrome.exe</Process_Name>
<PID>12164</PID>
<Operation>WriteFile</Operation>
<Result>SUCCESS</Result>
<Detail>Offset: 103,016, Length: 36</Detail>
<stack>
<frame>
<depth>0</depth>
<address>0xfffff8038683667c</address>
<path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
<location>FltDecodeParameters + 0x1a6c</location>
</frame>
<frame>
<depth>1</depth>
<address>0xfffff80386834e13</address>
<path>C:\WINDOWS\System32\drivers\FLTMGR.SYS</path>
<location>FltDecodeParameters + 0x203</location>
</frame>
<frame>
<depth>26</depth>
<address>0x7ffea54ffac1</address>
<path>C:\WINDOWS\SYSTEM32\ntdll.dll</path>
<location>RtlUserThreadStart + 0x21</location>
</frame>
</stack>
</event>
</eventlist>
</xml>
And the result that I would like to get is
ProcesnIndex Time_of_day Proces_Name PID Operation Result depth address path location
1063 2:54:20 chrome.exe 12164 ReadFile SUCCESS 0 0xfffff.. C:\WINDOWS\System32\driv... FltDecodeParameters + 0x1a6c
1063 2:54:20 chrome.exe 12164 ReadFile SUCCESS 1 0xfffff.. C:\WINDOWS\System32\driv... FltDecodeParameters + 0x203
1063 2:54:20 chrome.exe 12164 ReadFile SUCCESS 2 0xfffff.. C:\WINDOWS\System32\driv... tlUserThreadStart + 0x21
1063 2:54:20 chrome.exe 12164 WriteFile SUCCESS 0 0xfffff.. C:\WINDOWS\System32\driv... FltDecodeParameters + 0x1a6c
1063 2:54:20 chrome.exe 12164 WriteFile SUCCESS 1 0xfffff.. C:\WINDOWS\System32\driv... FltDecodeParameters + 0x203
1063 2:54:20 chrome.exe 12164 WriteFile SUCCESS 2 0xfffff.. C:\WINDOWS\System32\driv... RtlUserThreadStart + 0x21
I tried using XML package and xmlToDataFrame
xmldf_events_stack <- xmlToDataFrame(nodes=getNodeSet(data_xml_2,"//eventlist/event/stack/frame"))
but that only gives me flatten frames without parent data. Also If I try to parse event data to dataframe, all XML tags are removed from frame field so there is no way for me to parse that later.
Any help or guid in right direction will be appreciated
答案 0 :(得分:2)
我解决了问题,我确信有更优雅的方法可以做到这一点,但这就是我所做的。希望它能在未来帮助某人
df <- do.call(rbind.fill, lapply(data_xml_2['//eventlist/event'], function(x) {
names <- xpathSApply(x, './/.', xmlName)
names <- names[which(names == "text") - 1]
values <- xpathSApply(x, ".//text()", xmlValue)
framevalues <- values[8:length(values)]
framevalues <- matrix(framevalues, ncol = 4, byrow = TRUE)
retvalues <- framevalues
for(i in 7:1){
retvalues <- cbind(values[i],retvalues)
}
colnames(retvalues) <- names[1:12]
return(as.data.frame(retvalues))
}))
答案 1 :(得分:0)
考虑按节点索引<rules>
<rule name="HTTP api" stopProcessing="true">
<match url="^(.*/)?api/(.*)$" ignoreCase="true"/>
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile"
ignoreCase="false" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory"
ignoreCase="false" negate="true" />
</conditions>
<action type="Rewrite" url="{R:1}api/http.php/{R:2}"/>
</rule>
<rule name="Site pages" stopProcessing="true">
<match url="^(.*/)?pages/(.*)$" ignoreCase="true"/>
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile"
ignoreCase="false" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory"
ignoreCase="false" negate="true" />
</conditions>
<action type="Rewrite" url="{R:1}pages/index.php/{R:2}"/>
</rule>
<rule name="Staff applications" stopProcessing="true">
<match url="^(.*/)?scp/apps/(.*)$" ignoreCase="true"/>
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile"
ignoreCase="false" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory"
ignoreCase="false" negate="true" />
</conditions>
<action type="Rewrite" url="{R:1}scp/apps/dispatcher.php/{R:2}"/>
</rule>
</rules>
进行解析,然后将父级与[##]
中的子级合并,以便将数据帧列表完全绑定到行:
lapply