如何将此XML转换为CSV

时间:2012-09-25 14:46:13

标签: csv xml-parsing

过去4天我尝试使用此字段分配

将此XML文件转换为CSV

XML文件部分

<!-- language: lang-xml -->

<ponudba podjetje="SO d.o.o." velja_od="23.09.2012 @ 12:30:48">
    <artikel koda="LS593EAR" naziv="HP ENVY 17-2199e" kategorija="Prenosniki" podkategorija="Hewlett Packard (HP)" v_akciji="ne" kosovnost="več">
    <opis>
    HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)
    </opis>
    <opis_detail>
    HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)<br/><table> <col width="25%" /> <col /> <tbody> <tr> <th>Procesor</th> <td>Intel® Core™ i7-2630QM / 2.00 GHz / Quad-Core</td> </tr> <tr> <th>Delovni pomnilnik</th> <td>8 GB DDR3</td> </tr> <tr> <th>Trdi disk</th> <td>1 TB (1000 GB) / 5400 / SATA</td> </tr> <tr> <th>LCD zaslon</th> <td>43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080)</td> </tr> <tr> <th>Grafična kartica</th> <td>AMD Radeon™ HD 6850 Graphics</td> </tr> <tr> <th>Optična enota</th> <td>SuperMulti DVD-RW Double Layer</td> </tr> <tr> <th>USB 2.0</th> <td>2x</td> </tr> <tr> <th>USB 3.0</th> <td>1x</td> </tr>    <tr> <th>eSATA</th> <td>da</td> </tr> <tr> <th>HDMI</th> <td>da</td> </tr> <tr> <th>WiFi</th> <td>da</td> </tr> <tr> <th>Bluetooth</th> <td>da</td> </tr> <tr> <th>WWAN</th> <td>ne</td> </tr> <tr> <th>Spletna kamera</th> <td>da</td> </tr> <tr> <th>Card Reader</th> <td>da</td> </tr> <tr> <th>Express Card</th> <td>ne</td> </tr> <tr> <th>TV kartica</th> <td>ne</td> </tr> <tr> <th>Finger Print</th> <td>ne</td> </tr> <tr> <th>Vhodne naprave</th> <td>brez</td> </tr>     <tr> <th>Operacijski sistem</th> <td>Microsoft Windows 7 Home Premium (64 bit)</td> </tr> <tr> <th>Država uvoza</th> <td>Italijanska tipkovnica (priložene SLO nalepke)</td> </tr>  <tr> <th>Stanje modela</th> <td>HP Renew</td> </tr>     </tbody> </table>
    </opis_detail>
    <garancija_v_mesecih>12</garancija_v_mesecih>
    <cena_v_EUR>1.049,00</cena_v_EUR>
    <proizvajalec>HP</proizvajalec>
    <stanje>na zalogi</stanje>
    <url_foto_artikla>
    http://www.so-doo.si/media/catalog/product/cache/1/image/265x/9df78eab33525d08d6e5fb8d27136e95/c/0/c02034964.jpg.hri_4.jpg
    </url_foto_artikla>
    <vec_fotk_artikla>
    <slika href="http://www.so-doo.si/media/catalog/product/c/0/c02034982.jpg.hri_4.jpg"/>
    <slika href="http://www.so-doo.si/media/catalog/product/c/0/c02034991.jpg.hri_4.jpg"/>
    </vec_fotk_artikla>
    <teza_artikla_v_kg>2.9000</teza_artikla_v_kg>
    </artikel>

这是我想要的CSV文件 - 从XML汇集所有字段的所有数据不仅仅是一些数据:(

<!-- language: lang-csv -->

koda    naziv   kategorija  podkategorija   v_akciji    kosovnost   opis    opis_detail garancija_v_mesecih cena_v_EUR  proizvajalec    stanje  password    url_foto_artikla    vec_fotk_artikla

我试过了:

// The order here determines the order in the output CSV file
$columns = array(
    'koda',
    'naziv',
    'kategorija',
    'podkategorija',
    'v_akciji',
    'kosovnost'
);

// This will be used later on to correctly sort in the attribute values
// Note: the third paramter of "array_fill" determines what value to use
// in case a node lacks an attribute
$csv_blueprint = array_combine(
    $columns,
    array_fill(0, count($columns), '')
);

$data = array($columns);
$filexml = 'so_feed.xml';

if ( !file_exists($filexml) ) {
    // Do some error routine
} else {
    $xml = simplexml_load_file($filexml);
    $artikel = $xml->artikel;

    if ( !count($artikel) ) {
        // Stop processing 'cause there's nothing to do
    } else {
        foreach ( $artikel as $item ) {
            // Clone the row blueprint to leave the original unspoiled
            $row = $csv_blueprint;

我也尝试过这个:

$xml = simplexml_load_file($filexml);
//$artikel = $xml->artikel;
$ponudbas = $xml->ponudba;
...
    foreach ( $ponudbas as $ponudba ) {
        // Clone the row blueprint to leave the original unspoiled
        $row = $csv_blueprint;

但这两种方案都不会解析XML中的所有数据。 我不知道该怎么做:(

1 个答案:

答案 0 :(得分:0)

如果您的XML与您复制的完全相同,则它不是有效的XML文档。它最后缺少</ponudba>

要考虑的另一件事是XML格式是元素内部的数据,在您的情况下,我们可以看到在两个元素(17'')中使用双引号''。在某些特殊情况下,这会导致解析错误。如果你真的想要使用它们,也许最好使用CDATA Blocks中的数据来逃避那些特殊的字符。

编辑:我刚刚看到您的XML包含XML元素中的HTML元素,我们鼓励您为这种XML元素使用CDATA块。

如果您更容易,只需将XML转换为JSON并将其直接解码为php对象:

$json = json_encode($xml);
$data = json_decode($json, TRUE);

如果你想回写一个csv文件,你应该考虑使用fputcsv(http://php.net/manual/fr/function.fputcsv.php)

编辑2 尝试一个简单的测试:

使用:

$file='file.xml';
$xml = simplexml_load_file($file);

foreach ($xml->artikel as $art)
{    
    echo $art->opis_detail;
}

这将仅输出:

HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)

现在,如果您在节点上的XML上使用CDATA元素:

<opis_detail><![CDATA[HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)<br/><table> <col width="25%" /> <col /> <tbody> <tr> <th>Procesor</th> <td>Intel® Core™ i7-2630QM / 2.00 GHz / Quad-Core</td> </tr> <tr> <th>Delovni pomnilnik</th> <td>8 GB DDR3</td> </tr> <tr> <th>Trdi disk</th> <td>1 TB (1000 GB) / 5400 / SATA</td> </tr> <tr> <th>LCD zaslon</th> <td>43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080)</td> </tr> <tr> <th>Grafična kartica</th> <td>AMD Radeon™ HD 6850 Graphics</td> </tr> <tr> <th>Optična enota</th> <td>SuperMulti DVD-RW Double Layer</td> </tr> <tr> <th>USB 2.0</th> <td>2x</td> </tr> <tr> <th>USB 3.0</th> <td>1x</td> </tr>    <tr> <th>eSATA</th> <td>da</td> </tr> <tr> <th>HDMI</th> <td>da</td> </tr> <tr> <th>WiFi</th> <td>da</td> </tr> <tr> <th>Bluetooth</th> <td>da</td> </tr> <tr> <th>WWAN</th> <td>ne</td> </tr> <tr> <th>Spletna kamera</th> <td>da</td> </tr> <tr> <th>Card Reader</th> <td>da</td> </tr> <tr> <th>Express Card</th> <td>ne</td> </tr> <tr> <th>TV kartica</th> <td>ne</td> </tr> <tr> <th>Finger Print</th> <td>ne</td> </tr> <tr> <th>Vhodne naprave</th> <td>brez</td> </tr>     <tr> <th>Operacijski sistem</th> <td>Microsoft Windows 7 Home Premium (64 bit)</td> </tr> <tr> <th>Država uvoza</th> <td>Italijanska tipkovnica (priložene SLO nalepke)</td> </tr>  <tr> <th>Stanje modela</th> <td>HP Renew</td> </tr>     </tbody> </table>]]>
    </opis_detail>

现在输出:

HP ENVY 17-2199el, Intel Core i7-2630QM (2.0 GHz), 17.3'' FHD AG LED 3D, 8 GB DDR3 (2x 4 GB), 1 TB, BluRay, ATI Radeon HD6850 1024 MB, WiFi, Bluetooth, Webcam, 3D glasses, Microsoft Windows 7 Home Premium (64 bit)
Procesor    Intel® Core™ i7-2630QM / 2.00 GHz / Quad-Core
Delovni pomnilnik   8 GB DDR3
Trdi disk   1 TB (1000 GB) / 5400 / SATA
LCD zaslon  43,9 cm (17,3'') Full HD HP Ultra BrightView Infinity Display (1920x1080)
GrafiÄna kartica    AMD Radeonâ„¢ HD 6850 Graphics
OptiÄna enota   SuperMulti DVD-RW Double Layer
USB 2.0 2x
USB 3.0 1x
eSATA   da
HDMI    da
WiFi    da
Bluetooth   da
WWAN    ne
Spletna kamera  da
Card Reader da
Express Card    ne
TV kartica  ne
Finger Print    ne
Vhodne naprave  brez
Operacijski sistem  Microsoft Windows 7 Home Premium (64 bit)
Država uvoza   Italijanska tipkovnica (priložene SLO nalepke)
Stanje modela   HP Renew

我认为这是缺少的数据吗?