在阅读其他内容之前,请花些时间阅读original thread。
概述:.xfdl文件是一个gzip压缩的.xml文件,然后在base64中编码。我希望将.xfdl解码为xml然后我可以修改然后重新编码回.xfdl文件。
xfdl> xml.gz> xml> xml.gz> XFDL
我已经能够使用uudeview获取.xfdl文件并从base64对其进行解码:
uudeview -i yourform.xfdl
然后使用gunzip解压缩它
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xml
生成的xml是100%可读的,看起来很棒。然后,在不修改xml的情况下,我应该能够使用gzip重新压缩它:
gzip yourform-unpacked.xml
然后在base-64中重新编码:
base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl
如果我的想法是正确的,原始文件和重新编码的文件应该相等。但是,如果我将yourform.xfdl和yourform_reencoded.xfdl置于比较之外,则它们不匹配。此外,原始文件可以在http://www.grants.gov/help/download_software.jsp#pureedge">.xfdl查看器中查看。查看器说重新编码的xfdl是不可读的。
我也试过uuenview在base64中重新编码,它也会产生相同的结果。任何帮助将不胜感激。
答案 0 :(得分:2)
据我所知,您无法找到已压缩文件的压缩级别。在压缩文件时,您可以使用 - #指定压缩级别,其中#是1到9(1是最快压缩,9是压缩文件最多)。在实践中,您永远不应该将压缩文件与已提取和重新压缩的文件进行比较,轻微的变化很容易突然出现。在您的情况下,我将比较base64编码的版本而不是gzip的版本。
答案 1 :(得分:1)
您需要将以下行放在XFDL文件的开头:
application/vnd.xfdl; content-encoding="base64-gzip"
生成base64编码文件后,在文本编辑器中打开它并将上面的行粘贴到第一行。确保base64'ed块从第二行的开头开始。
保存并在查看器中试用!如果它仍然不起作用,那么对XML所做的更改可能会以某种方式使其不符合要求。在这种情况下,在修改XML之后,但在对其进行gzip压缩和base64编码之前,请使用.xfdl文件扩展名保存它,并尝试使用Viewer工具打开它。如果查看器处于有效的XFDL格式,则查看器应该能够解析并显示未压缩/未编码的文件。
答案 2 :(得分:1)
检查这些:
http://www.ourada.org/blog/archives/375
http://www.ourada.org/blog/archives/390
它们是Python,而不是Ruby,但这应该会让你非常接近。
该算法实际上适用于标题为'application / x-xfdl; content-encoding =“asc-gzip”'而不是'application / vnd.xfdl;内容编码=“的base64 gzip的”” 但好消息是PureEdge(又名IBM Lotus Forms)将打开这种格式没有问题。
然后最重要的是,这是一个base64-gzip解码(在Python中),所以你可以进行完整的往返:
with open(filename, 'r') as f:
header = f.readline()
if header == 'application/vnd.xfdl; content-encoding="base64-gzip"\n':
decoded = b''
for line in f:
decoded += base64.b64decode(line.encode("ISO-8859-1"))
xml = zlib.decompress(decoded, zlib.MAX_WBITS + 16)
答案 3 :(得分:1)
我是在http://iharder.net/base64的Base64类的帮助下用Java完成的。
我一直致力于在Java中进行表单操作的应用程序。我解码文件,从XML创建一个DOM文档,然后将其写回文件。
我在Java中用来读取文件的代码如下:
public XFDLDocument(String inputFile)
throws IOException,
ParserConfigurationException,
SAXException
{
fileLocation = inputFile;
try{
//create file object
File f = new File(inputFile);
if(!f.exists()) {
throw new IOException("Specified File could not be found!");
}
//open file stream from file
FileInputStream fis = new FileInputStream(inputFile);
//Skip past the MIME header
fis.skip(FILE_HEADER_BLOCK.length());
//Decompress from base 64
Base64.InputStream bis = new Base64.InputStream(fis,
Base64.DECODE);
//UnZIP the resulting stream
GZIPInputStream gis = new GZIPInputStream(bis);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(gis);
gis.close();
bis.close();
fis.close();
}
catch (ParserConfigurationException pce) {
throw new ParserConfigurationException("Error parsing XFDL from file.");
}
catch (SAXException saxe) {
throw new SAXException("Error parsing XFDL into XML Document.");
}
}
我在java中的代码看起来像这样将文件写入磁盘:
/**
* Saves the current document to the specified location
* @param destination Desired destination for the file.
* @param asXML True if output needs should be as un-encoded XML not Base64/GZIP
* @throws IOException File cannot be created at specified location
* @throws TransformerConfigurationExample
* @throws TransformerException
*/
public void saveFile(String destination, boolean asXML)
throws IOException,
TransformerConfigurationException,
TransformerException
{
BufferedWriter bf = new BufferedWriter(new FileWriter(destination));
bf.write(FILE_HEADER_BLOCK);
bf.newLine();
bf.flush();
bf.close();
OutputStream outStream;
if(!asXML) {
outStream = new GZIPOutputStream(
new Base64.OutputStream(
new FileOutputStream(destination, true)));
} else {
outStream = new FileOutputStream(destination, true);
}
Transformer t = TransformerFactory.newInstance().newTransformer();
t.transform(new DOMSource(doc), new StreamResult(outStream));
outStream.flush();
outStream.close();
}
希望有所帮助。
答案 4 :(得分:1)
我一直在做类似的事情,这应该适用于PHP。你必须有一个可写的tmp文件夹,php文件名为example.php!
<?php
function gzdecode($data) {
$len = strlen($data);
if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
echo "FILE NOT GZIP FORMAT";
return null; // Not GZIP format (See RFC 1952)
}
$method = ord(substr($data,2,1)); // Compression method
$flags = ord(substr($data,3,1)); // Flags
if ($flags & 31 != $flags) {
// Reserved bits are set -- NOT ALLOWED by RFC 1952
echo "RESERVED BITS ARE SET. VERY BAD";
return null;
}
// NOTE: $mtime may be negative (PHP integer limitations)
$mtime = unpack("V", substr($data,4,4));
$mtime = $mtime[1];
$xfl = substr($data,8,1);
$os = substr($data,8,1);
$headerlen = 10;
$extralen = 0;
$extra = "";
if ($flags & 4) {
// 2-byte length prefixed EXTRA data in header
if ($len - $headerlen - 2 < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$extralen = unpack("v",substr($data,8,2));
$extralen = $extralen[1];
if ($len - $headerlen - 2 - $extralen < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$extra = substr($data,10,$extralen);
$headerlen += 2 + $extralen;
}
$filenamelen = 0;
$filename = "";
if ($flags & 8) {
// C-style string file NAME data in header
if ($len - $headerlen - 1 < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$filenamelen = strpos(substr($data,8+$extralen),chr(0));
if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$filename = substr($data,$headerlen,$filenamelen);
$headerlen += $filenamelen + 1;
}
$commentlen = 0;
$comment = "";
if ($flags & 16) {
// C-style string COMMENT data in header
if ($len - $headerlen - 1 < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
return false; // Invalid header format
echo "INVALID FORMAT";
}
$comment = substr($data,$headerlen,$commentlen);
$headerlen += $commentlen + 1;
}
$headercrc = "";
if ($flags & 1) {
// 2-bytes (lowest order) of CRC32 on header present
if ($len - $headerlen - 2 < 8) {
return false; // Invalid format
echo "INVALID FORMAT";
}
$calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
$headercrc = unpack("v", substr($data,$headerlen,2));
$headercrc = $headercrc[1];
if ($headercrc != $calccrc) {
echo "BAD CRC";
return false; // Bad header CRC
}
$headerlen += 2;
}
// GZIP FOOTER - These be negative due to PHP's limitations
$datacrc = unpack("V",substr($data,-8,4));
$datacrc = $datacrc[1];
$isize = unpack("V",substr($data,-4));
$isize = $isize[1];
// Perform the decompression:
$bodylen = $len-$headerlen-8;
if ($bodylen < 1) {
// This should never happen - IMPLEMENTATION BUG!
echo "BIG OOPS";
return null;
}
$body = substr($data,$headerlen,$bodylen);
$data = "";
if ($bodylen > 0) {
switch ($method) {
case 8:
// Currently the only supported compression method:
$data = gzinflate($body);
break;
default:
// Unknown compression method
echo "UNKNOWN COMPRESSION METHOD";
return false;
}
} else {
// I'm not sure if zero-byte body content is allowed.
// Allow it for now... Do nothing...
echo "ITS EMPTY";
}
// Verifiy decompressed size and CRC32:
// NOTE: This may fail with large data sizes depending on how
// PHP's integer limitations affect strlen() since $isize
// may be negative for large sizes.
if ($isize != strlen($data) || crc32($data) != $datacrc) {
// Bad format! Length or CRC doesn't match!
echo "LENGTH OR CRC DO NOT MATCH";
return false;
}
return $data;
}
echo "<html><head></head><body>";
if (empty($_REQUEST['upload'])) {
echo <<<_END
<form enctype="multipart/form-data" action="example.php" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="100000" />
<table>
<th>
<input name="uploadedfile" type="file" />
</th>
<tr>
<td><input type="submit" name="upload" value="Convert File" /></td>
</tr>
</table>
</form>
_END;
}
if (!empty($_REQUEST['upload'])) {
$file = "tmp/" . $_FILES['uploadedfile']['name'];
$orgfile = $_FILES['uploadedfile']['name'];
$name = str_replace(".xfdl", "", $orgfile);
$convertedfile = "tmp/" . $name . ".xml";
$compressedfile = "tmp/" . $name . ".gz";
$finalfile = "tmp/" . $name . "new.xfdl";
$target_path = "tmp/";
$target_path = $target_path . basename($_FILES['uploadedfile']['name']);
if (move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
} else {
echo "There was an error uploading the file, please try again!";
}
$firstline = "application/vnd.xfdl; content-encoding=\"base64-gzip\"\n";
$data = file($file);
$data = array_slice($data, 1);
$raw = implode($data);
$decoded = base64_decode($raw);
$decompressed = gzdecode($decoded);
$compressed = gzencode($decompressed);
$encoded = base64_encode($compressed);
$decoded2 = base64_decode($encoded);
$decompressed2 = gzdecode($decoded2);
$header = bin2hex(substr($decoded, 0, 10));
$tail = bin2hex(substr($decoded, -8));
$header2 = bin2hex(substr($compressed, 0, 10));
$tail2 = bin2hex(substr($compressed, -8));
$header3 = bin2hex(substr($decoded2, 0, 10));
$tail3 = bin2hex(substr($decoded2, -8));
$filehandle = fopen($compressedfile, 'w');
fwrite($filehandle, $decoded);
fclose($filehandle);
$filehandle = fopen($convertedfile, 'w');
fwrite($filehandle, $decompressed);
fclose($filehandle);
$filehandle = fopen($finalfile, 'w');
fwrite($filehandle, $firstline);
fwrite($filehandle, $encoded);
fclose($filehandle);
echo "<center>";
echo "<table style='text-align:center' >";
echo "<tr><th>Stage 1</th>";
echo "<th>Stage 2</th>";
echo "<th>Stage 3</th></tr>";
echo "<tr><td>RAW DATA -></td><td>DECODED DATA -></td><td>UNCOMPRESSED DATA -></td></tr>";
echo "<tr><td>LENGTH: ".strlen($raw)."</td>";
echo "<td>LENGTH: ".strlen($decoded)."</td>";
echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
echo "<tr><td><a href='tmp/".$orgfile."'/>ORIGINAL</a></td><td>GZIP HEADER:".$header."</td><td><a href='".$convertedfile."'/>XML CONVERTED</a></td></tr>";
echo "<tr><td></td><td>GZIP TAIL:".$tail."</td><td></td></tr>";
echo "<tr><td><textarea cols='30' rows='20'>" . $raw . "</textarea></td>";
echo "<td><textarea cols='30' rows='20'>" . $decoded . "</textarea></td>";
echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>";
echo "<tr><th>Stage 6</th>";
echo "<th>Stage 5</th>";
echo "<th>Stage 4</th></tr>";
echo "<tr><td>ENCODED DATA <-</td><td>COMPRESSED DATA <-</td><td>UNCOMPRESSED DATA <-</td></tr>";
echo "<tr><td>LENGTH: ".strlen($encoded)."</td>";
echo "<td>LENGTH: ".strlen($compressed)."</td>";
echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
echo "<tr><td></td><td>GZIP HEADER:".$header2."</td><td></td></tr>";
echo "<tr><td></td><td>GZIP TAIL:".$tail2."</td><td></td></tr>";
echo "<tr><td><a href='".$finalfile."'/>FINAL FILE</a></td><td><a href='".$compressedfile."'/>RE-COMPRESSED FILE</a></td><td></td></tr>";
echo "<tr><td><textarea cols='30' rows='20'>" . $encoded . "</textarea></td>";
echo "<td><textarea cols='30' rows='20'>" . $compressed . "</textarea></td>";
echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>";
echo "</table>";
echo "</center>";
}
echo "</body></html>";
?>
答案 5 :(得分:0)
gzip算法的不同实现总是会产生稍微不同但仍然正确的文件,原始文件的压缩级别可能与您运行它的时间不同。
答案 6 :(得分:0)
有趣的是,我会试一试。然而,变化并不轻微。新编码的文件较长,在比较之前和之后的二进制文件时,数据几乎不匹配。
之前(前三行)
H4sIAAAAAAAAC+19eZOiyNb3/34K3r4RT/WEU40ssvTtrhuIuKK44Bo3YoJdFAFZ3D79C6hVVhUq
dsnUVN/qmIkSOLlwlt/JPCfJ/PGf9dwAlorj6pb58wv0LfcFUEzJknVT+/ml2uXuCSJP3kNf/vOQ
+TEsFVkgoDfdn18mnmd/B8HVavWt5TsKI2vKN8magyENiH3Lf9kRfpd817PmF+jpiOhQRFZcXTMV
之后(前三行):
H4sICJ/YnEgAAzEyNDQ2LTExNjk2NzUueGZkbC54bWwA7D1pU+JK19/9FV2+H5wpByEhJMRH
uRUgCMom4DBYt2oqkAZyDQlmQZ1f/3YSNqGzKT3oDH6RdE4vOXuf08vFP88TFcygYSq6dnlM
naWOAdQGuqxoo8vjSruRyGYzfII6/id3dPGjVKwCBK+Zl8djy5qeJ5NPT09nTduAojyCZwN9
正如你可以看到H4SI
匹配,之后就是混乱。
答案 7 :(得分:0)
gzip会将文件名放在文件头中,以便gzip文件的长度根据未压缩文件的文件名而有所不同。
如果gzip作用于流,则省略文件名并且文件稍短,因此以下内容应该有效:
gzip yourform-unpacked.xml.gz
然后在base-64中重新编码: base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl
也许这会产生一个长度相同的文件