我们将“记录”导出到xml文件;我们的一位客户抱怨该文件太大而无法处理其他系统。因此,我需要拆分文件,同时在每个新文件中重复“标题部分”。
所以我正在寻找能让我为应该总是输出的部分定义一些xpath的东西,以及“rows”的另一个xpath,其中的参数表示要在每个文件中放入多少行如何命名文件。
在我开始编写一些自定义.net代码之前; 是否有一个标准的命令行工具可以在Windows上运行?
(因为我知道如何用C#编程,我更多地编写代码然后尝试搞乱复杂的xsl等,但是“自我”解决方案会比自定义代码更好。)
答案 0 :(得分:3)
没有通用的解决方案,因为源XML的结构有很多种不同的可能方式。
构建一个将输出XML文档片段的XSLT转换是相当简单的。例如,给定这个XML:
<header>
<data rec="1"/>
<data rec="2"/>
<data rec="3"/>
<data rec="4"/>
<data rec="5"/>
<data rec="6"/>
</header>
您可以使用此XSLT输出仅包含特定范围内data
个元素的文件的副本:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:param name="startPosition"/>
<xsl:param name="endPosition"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="header">
<xsl:copy>
<xsl:apply-templates select="data"/>
</xsl:copy>
</xsl:template>
<xsl:template match="data">
<xsl:if test="position() >= $startPosition and position() <= $endPosition">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
(顺便说一句,请注意,因为这是基于身份转换,即使header
不是顶级元素,它仍然有效。)
您仍然需要计算源XML中的data
元素,并使用适合该情况的$startPosition
和$endPosition
值重复运行转换。
答案 1 :(得分:3)
首先从此链接http://www.firstobject.com/foxe242.zip
下载foxe xml编辑器观看该视频http://www.firstobject.com/xml-splitter-script-video.htm 视频解释了分割代码的工作原理。
该页面上有一个脚本代码(以split()
开头)复制代码,在xml编辑器程序中,在“文件”下创建一个“新程序”。粘贴代码并保存。代码是:
split()
{
CMarkup xmlInput, xmlOutput;
xmlInput.Open( "**50MB.xml**", MDF_READFILE );
int nObjectCount = 0, nFileCount = 0;
while ( xmlInput.FindElem("//**ACT**") )
{
if ( nObjectCount == 0 )
{
++nFileCount;
xmlOutput.Open( "**piece**" + nFileCount + ".xml", MDF_WRITEFILE );
xmlOutput.AddElem( "**root**" );
xmlOutput.IntoElem();
}
xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
++nObjectCount;
if ( nObjectCount == **5** )
{
xmlOutput.Close();
nObjectCount = 0;
}
}
if ( nObjectCount )
xmlOutput.Close();
xmlInput.Close();
return nFileCount;
}
根据需要更改粗体标记(或** **标记)字段。 (这也在视频页面上表达)
在xml编辑器窗口中右键单击并单击RUN(或简称F9)。窗口上有一个输出栏,显示生成的文件数。
注意:
输入文件名可以是"C:\\Users\\AUser\\Desktop\\a_xml_file.xml"
(双斜线)
并输出文件"C:\\Users\\AUser\\Desktop\\anoutputfolder\\piece" + nFileCount + ".xml"
答案 2 :(得分:2)
xml_split - 将大型XML文档拆分为更小的块
答案 3 :(得分:2)
如前所述,Perl package XML::Twig中的xml_split
表现非常出色。
xml_split < bigFile.xml
#or if compressed e.g.
bzcat bigFile.xml.bz2 | xml_split
没有任何参数xml_split
为每个顶级子节点创建一个文件。
有parameters来指定每个文件所需的元素数量(-g
)或近似大小(-s <Kb|Mb|Gb>
)。
sudo apt-get install xml-twig-tools
答案 4 :(得分:1)
内置任何东西都无法轻易应对这种情况。
你的方法听起来很合理,但我可能会从一个“骨架”文档开始,其中包含需要重复的元素,并使用“记录”生成多个文档。
更新
经过一番挖掘,我发现this文章描述了使用XSLT分割文件的方法。
答案 5 :(得分:0)
使用基于https://www.ultraedit.com/forums/viewtopic.php?f=52&t=6704
的Ultraedit我添加的只是一些XML页眉和页脚位 需要手动修复第一个和最后一个文件(或从源中删除根元素)。
// from https://www.ultraedit.com/forums/viewtopic.php?f=52&t=6704
var FoundsPerFile = 200; // Global setting for number of found split strings per file.
var SplitString = "</letter>"; // String where to split. The split occurs after next character.
var xmlHead = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>';
var xmlRootStart = '<letters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" letterCode="OA01" >';
var xmlRootEnd = '</letters>';
/* Find the tab index of the active document */
// Copied from http://www.ultraedit.com/forums/viewtopic.php?t=4571
function getActiveDocumentIndex () {
var tabindex = -1; /* start value */
for (var i = 0; i < UltraEdit.document.length; i++)
{
if (UltraEdit.activeDocument.path==UltraEdit.document[i].path) {
tabindex = i;
break;
}
}
return tabindex;
}
if (UltraEdit.document.length) { // Is any file open?
// Set working environment required for this job.
UltraEdit.insertMode();
UltraEdit.columnModeOff();
UltraEdit.activeDocument.hexOff();
UltraEdit.ueReOn();
// Move cursor to top of active file and run the initial search.
UltraEdit.activeDocument.top();
UltraEdit.activeDocument.findReplace.searchDown=true;
UltraEdit.activeDocument.findReplace.matchCase=true;
UltraEdit.activeDocument.findReplace.matchWord=false;
UltraEdit.activeDocument.findReplace.regExp=false;
// If the string to split is not found in this file, do nothing.
if (UltraEdit.activeDocument.findReplace.find(SplitString)) {
// This file is probably the correct file for this script.
var FileNumber = 1; // Counts the number of saved files.
var StringsFound = 1; // Counts the number of found split strings.
var NewFileIndex = UltraEdit.document.length;
/* Get the path of the current file to save the new
files in the same directory as the current file. */
var SavePath = "";
var LastBackSlash = UltraEdit.activeDocument.path.lastIndexOf("\\");
if (LastBackSlash >= 0) {
LastBackSlash++;
SavePath = UltraEdit.activeDocument.path.substring(0,LastBackSlash);
}
/* Get active file index in case of more than 1 file is open and the
current file does not get back the focus after closing the new files. */
var FileToSplit = getActiveDocumentIndex();
// Always use clipboard 9 for this script and not the Windows clipboard.
UltraEdit.selectClipboard(9);
// Split the file after every x found split strings until source file is empty.
while (1) {
while (StringsFound < FoundsPerFile) {
if (UltraEdit.document[FileToSplit].findReplace.find(SplitString)) StringsFound++;
else {
UltraEdit.document[FileToSplit].bottom();
break;
}
}
// End the selection of the find command.
UltraEdit.document[FileToSplit].endSelect();
// Move the cursor right to include the next character and unselect the found string.
UltraEdit.document[FileToSplit].key("RIGHT ARROW");
// Select from this cursor position everything to top of the file.
UltraEdit.document[FileToSplit].selectToTop();
// Is the file not already empty?
if (UltraEdit.document[FileToSplit].isSel()) {
// Cut the selection and paste it into a new file.
UltraEdit.document[FileToSplit].cut();
UltraEdit.newFile();
UltraEdit.document[NewFileIndex].setActive();
UltraEdit.activeDocument.paste();
/* Add line termination on the last line and remove automatically added indent
spaces/tabs if auto-indent is enabled if the last line is not already terminated. */
if (UltraEdit.activeDocument.isColNumGt(1)) {
UltraEdit.activeDocument.insertLine();
if (UltraEdit.activeDocument.isColNumGt(1)) {
UltraEdit.activeDocument.deleteToStartOfLine();
}
}
// add headers and footers
UltraEdit.activeDocument.top();
UltraEdit.activeDocument.write(xmlHead);
UltraEdit.activeDocument.write(xmlRootStart);
UltraEdit.activeDocument.bottom();
UltraEdit.activeDocument.write(xmlRootEnd);
// Build the file name for this new file.
var SaveFileName = SavePath + "LETTER";
if (FileNumber < 10) SaveFileName += "0";
SaveFileName += String(FileNumber) + ".raw.xml";
// Save the new file and close it.
UltraEdit.saveAs(SaveFileName);
UltraEdit.closeFile(SaveFileName,2);
FileNumber++;
StringsFound = 0;
/* Delete the line termination in the source file
if last found split string was at end of a line. */
UltraEdit.document[FileToSplit].endSelect();
UltraEdit.document[FileToSplit].key("END");
if (UltraEdit.document[FileToSplit].isColNumGt(1)) {
UltraEdit.document[FileToSplit].top();
} else {
UltraEdit.document[FileToSplit].deleteLine();
}
} else break;
UltraEdit.outputWindow.write("Progress " + SaveFileName);
} // Loop executed until source file is empty!
// Close source file without saving and re-open it.
var NameOfFileToSplit = UltraEdit.document[FileToSplit].path;
UltraEdit.closeFile(NameOfFileToSplit,2);
/* The following code line could be commented if the source
file is not needed anymore for further actions. */
UltraEdit.open(NameOfFileToSplit);
// Free memory and switch back to Windows clipboard.
UltraEdit.clearClipboard();
UltraEdit.selectClipboard(0);
}
}
答案 6 :(得分:-2)
“有没有一个标准的命令行工具可以在Windows上运行它?”