Question

我有一个XML文件。

<key>457</key>
    <dict>
        <key>Track ID</key><integer>457</integer>
        <key>Name</key><string>Love me do</string>
        <key>Artist</key><string>The Beatles</string>
        <key>Album Artist</key><string>The Beatles</string>
        <key>Composer</key><string>John Lennon/Paul McCartney</string>
        <key>Album</key><string>The Beatles No.1</string>
        <key>Genre</key><string>Varies</string>
        <key>Kind</key><string>AAC audio file</string>
</dict>

为了这些目的，我删除了很多文件（这是一首歌，每首歌还有大约20-30行XML）。我想做的是从每首歌曲中提取“艺术家”字符串，然后删除所有重复的字符串，然后将其输出并输出到HTML文件中;最好是在找到新版本的.xml时自动刷新，从而保留更新的文件，但是如果它过于复杂，那就没问题了。

我已经研究过用jQuery做的方法，我已经有PHP建议，但我不确定哪个更好/更干净;我不确定如何在两者中做到这一点。

非常感谢，

亨利。

Answer 1

你到底想要达到什么目的？如果您需要基于XML文件定期重新生成的HTML文件，那么您可能希望编写一个程序（例如，BeautifulSoup Python库允许您非常轻松地解析XML / HTML文件）并在每次运行时运行它需要更新HTML文件（你也可以为它设置一个cron作业）。

如果您需要能够动态地从XML获取数据，您可以使用一些JavaScript库并从xml文件加载XML，然后动态地将其添加到页面。

例如，这个Python程序将解析XML文件（file.xml）并创建一个包含XML文件数据的HTML文件（song_information.html）。

from BeautifulSoup import BeautifulStoneSoup

f = open("file.xml")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = open("song_information.html", "w")
f.write(html)
f.close()

它会将以下HTML写入song_information.html文件：

<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
<h1>Track ID</h1>
<p>457</p>
<h1>Name</h1>
<p>Love me do</p>
<h1>Artist</h1>
<p>The Beatles</p>
<h1>Album Artist</h1>
<p>The Beatles</p>
<h1>Composer</h1>
<p>John Lennon/Paul McCartney</p>
<h1>Album</h1>
<p>The Beatles No.1</p>
<h1>Genre</h1>
<p>Varies</p>
<h1>Kind</h1>
<p>AAC audio file</p>
</body>
</html>

当然，这是简化的。如果您需要实现unicode支持，您需要编辑它：

from BeautifulSoup import BeautifulStoneSoup
import codecs

f = codecs.open("file.xml", "r", "utf-8")
soup = BeautifulStoneSoup(f.read())
f.close()

html = """<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song information</title>
</head>
<body>
"""

for key in soup.dict.findAll('key'):
    html += "<h1>%s</h1>\n" % key.contents[0]
    html += "<p>%s</p>\n" % key.nextSibling.contents[0]

html += """</body>
</html>
"""

f = codecs.open("song_information.html", "w", "utf-8")
f.write(html)
f.close()

此外，您可能需要生成更复杂的HTML，因此您可能需要尝试某些模板系统，例如Jinja2。

Answer 2

我会在PHP中执行此操作：将您的XML放入字符串中，然后（因为只有您将使用它），将其编码为JSON，将其解码为assoc数组，然后运行foreach循环以提取艺术家，最后删除重复，然后将其另存为HTML。然后，您可以添加一个cron作业来定期运行它，并生成HTML。运行此代码，然后链接到它提供的结果。

$contents = '<key>Blah.... lots of XML';

$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);

print_r($array);

一旦我知道生成的数组的结构，我就可以完成代码了。但它看起来像这样：

foreach($array['dict']['artist'] as $artist) {
    $artists[] = $artist;
}

// Now $artists holds an array of the artists

$arists = array_unique($artists);

// Now there are no duplicates

foreach($artists as $artist) {
    $output .= '<p>',$artist,'</p>';
}

// Now each artist is put in it's own paragraph.

// Either output the output
echo $output;

// Or save it to a file (in this case, 'artists.html')

$fh = fopen('artists.html', 'w') or die("Can't open file");
fwrite($fh, $output);
fclose($fh);

这不起作用，因为第一个foreach循环中的行需要稍微调整一下，但这是一个起点。

如何从特定标记中将值从XML文件提取到HTML页面？

2 个答案: