@ r2evans
[1] Tembec Inc. Tembec Inc. Tembec Inc.
197个级别:加拿大国家银行基金会对Barreau duQuébec... Irving Resources Inc.
> dest
[1] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets"
> myfiles
[1] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02427367-00000007-00038001-i@#SLH#Sedar2#Western#FINALPROSPECTUS-PDF-REGFILE.xml"
[2] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02439236-00000002-00026641-C@#SEDAR#2016#First_Quarter_Report-PDF-REGFILE.xml"
C@#SEDAR#FILINGS#151231Ironside_Q2FS_cm-PDF-REGFILE.xml"
[9] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02440535-00000001-00002159-s@#SEDAR#Firan#Annual#AnnualReport-2015-FTG-PDF-REGFILE.xml"
[10] "C:\\Sedar_data\\2016\\2016_02\\balance-sheets/02440536-00000001-00002159-s@#SEDAR#Firan#Annual#MD-A-2015-FTG-PDF-REGFILE.xml"
假设这些是输入文件。如果我想输出Firan,那么输出就是 [11]“C:\ Sedar_data \ 2016 \ 2016_02 \ balance-sheets/02440538-00000001-00002159-s@#SEDAR#Firan#Annual#AFS-2015-FTG-PDF-REGFILE.xml”
我正在通过
从我的电脑上读取多个XML文件dest <- "C:\\my_data\\2016\\2016_02"
在XML文件中,我输出了标签。我根据公司名称过滤文件。
look.for <- c( "Technology Group Corporation")
name_filter <- filesList_df[filesList_df$`names_try[1, 1]` %in% look.for ,] name_filter
输出为
> name_filter
[1]Technology Group Corporation Firan Technology Group Corporation Firan Technology Group Corporation
[4] Technology Group Corporation Firan Technology Group Corporation Firan Technology Group Corporation
[7] Technology Group Corporation Firan Technology Group Corporation
197 Levels:Bank ... Resources Inc.
但是,我实际上想输出这些文件的路径。能不能请你帮我解决这个问题,提前谢谢。
完整的代码是
library(XML)
library(methods)
library(plyr)
library(stringr)
###############
#Parsing the files will be beneficial ##########
#d1 <- "C:\\Users\\DSLGuest\\Desktop\\Data\\2016/2016_03/2016-03-16/02455279-00000001-00001297-C@#Temp#BORALEX#2016#aNNUALfILINGS#MDA#MDAeng-PDF-REGFILE.xml"
#doc1 <- xmlParse(d1)
#doc1
##########################################################################
dest <- "C:\\Sedar_data\\2016\\2016_02"
myfiles <- list.files(path = dest, recursive=TRUE, pattern = "xml", full.names = TRUE)
filesList_df <- data.frame(File=character(), stringsAsFactors=FALSE)
for (i in myfiles){
result <- xmlParse(i)
#print(result)
rootnode <- xmlRoot(result)
#print(rootnode)
rootsize <- xmlSize(rootnode)
#print(rootnode[[15]][[1]][[2]]) } #GIVES the NAME_of_the_company
names_try <- (ldply(xmlToList(rootnode[[15]][[1]][[2]]), data.frame ))
filesList_df <- rbind(filesList_df, as.data.frame(names_try[1,1]))
filesList_df
look.for <- c( "Firan Technology Group Corporation")
name_filter <- filesList_df[filesList_df$`names_try[1, 1]` %in% look.for ,]
name_filter
}