TLDNR:如何在unzip()中使用Sys.glob()?
我有多个.zip文件,我想从每个档案中只提取一个文件。
例如,其中一个档案包含以下文件:
[1] "cmc-20150531.xml" "cmc-20150531.xsd" "cmc-20150531_cal.xml" "cmc-20150531_def.xml" "cmc-20150531_lab.xml"
[6] "cmc-20150531_pre.xml"
我想提取第一个文件,因为它与模式匹配。为此,我使用以下命令:
unzip("zip-archive.zip", files=Sys.glob("[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"))
然而,这个命令不起作用,我也不知道为什么。 R只提取存档中的所有文件。
另一方面,以下命令有效:
unzip("zip-archive.zip", files="cmc-20150531.xml")
如何在unzip()中使用Sys.glob()?
答案 0 :(得分:1)
Sys.glob
展开已存在的文件。因此,unzip
调用的参数将取决于工作目录中的文件。
也许您希望unzip
与list=TRUE
一起返回zip中的文件列表,然后使用一些模式匹配来选择所需的文件。
有关使用模式匹配字符串的信息,请参阅?grep
。这些模式是“正则表达式”而不是“glob”扩展,但您应该能够使用它。
这是一个具体的例子:
# whats in the zip?
files = unzip("c.zip", list=TRUE)$Name
files
[1] "l_spatial.dbf" "l_spatial.shp" "l_spatial.shx" "ls_polys_bin.dbf"
[5] "ls_polys_bin.shp" "ls_polys_bin.shx" "rast_jan90.tif"
# what files have "dbf" in them:
files[grepl("dbf",files)]
[1] "l_spatial.dbf" "ls_polys_bin.dbf"
# extract just those:
unzip("c.zip", files=files[grepl("dbf",files)])
你的glob的正则表达式
"[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"
将是
"^[a-z]{3}-[0-9]{8}\\.xml$"
这是字符串开头(“^”),3 az(仅小写),短划线,八位数,一个点(需要反斜杠,一个因为点表示正则表达式中的“任何一个字符”而另一个表示匹配)的匹配因为R需要反斜杠来逃避反斜杠),“xml”和字符串的结尾(“$”)。
答案 1 :(得分:-1)
Just with any other collections do an itertive loop through the results from Sys.glob and supply the itertive holding variable to unzip. This is achieved by using a for-loop
While unzip()
takes an argument for the path, and files is an arugment for what files within that zip file.
Mined you I'm more a full stack programmer not so much so on the R lang, but the concepts are the same; so the code should something like:
files <- Sys.glob(path_expand(".","*.zip"))
for (idx in 1:length(files)) {
results = unzip(files[idx], "*.xml")
}
As for using regex in unzip()
that is something one should read the documentation. I could only advise doing another for-loop
to compare the contest of the zip file to your regex then preforming the extraction. Psudocode follows:
files ::= glob(*.zip)
regex ::=
for idx1 in length(files); do
regex="[a-z]{3}\-[0-9]{8}\.xml"
content = unzip(files[idx1])
for idx2 in length(content); do
if content[idx2].name ~= regex.expand(); then
# do something with found file
end if
end for
end for
Basically your just looping through your list of zip files, then through the list of files within the zip file and comparing the filename from inside your zipfile agenst the regex and extracting/preforming operations on only that file.