将整个xml文件从Web解析为sqlite数据库

时间:2018-05-15 11:00:32

标签: xml database sqlite parsing download

我目前正在开展一个小项目。我想用sqlite创建一个自我更新的数据库。我正在使用的数据是每日国债收益率曲线(https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll)。

现在我正在努力解析整个xml文件并将其保存到sqllite数据库中?

任何推荐?

1 个答案:

答案 0 :(得分:0)

因此,有两种方法可以继续,您可以使用许多语言/技术来执行此操作。 Node.js是一个选项,它对下载XML数据,解析它和写入sqlite数据库有很好的支持。

这是一个示例程序(用Node.js编写):

<强> index.js

"use strict";

var sqlite3 = require('sqlite3').verbose();
var db = new sqlite3.Database('testData2.sqlite3');
var request = require('request');
var fs = require('fs');
var parseString = require('xml2js').parseString;

console.log('Downloading bond data..');

var options = {
    url: "http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData",
    method: "get"
};

request(options, function (error, response, body) {

    if (error) {
        console.error('error:', error);
    } else {
        console.log('Response: StatusCode:', response && response.statusCode);
        console.log('Response: Body: Length: %d.', body.length);

        let xml = body;
        console.log('Parsing XML..');
        parseString(xml, {explicitArray: false, mergeAttrs: true }, function (err, result) {
            console.log('Writing bond data to database..');
            writeBondDataToDB(result.feed.entry);
        });
    }
});

function writeBondDataToDB(entries)
{
    db.serialize(function() {
      db.run("CREATE TABLE bondData (Date TEXT, BC_1YEAR TEXT, BC_5YEAR TEXT)");
    });

    db.serialize(function() {
      var stmt = db.prepare("INSERT INTO bondData VALUES (?,?,?)");
      for (var i = 0; i < 100; i++) {
          console.log('Writing entry to database for time: ',  (entries[i].content['m:properties']['d:NEW_DATE']['_']));
          var entry = entries[i].content['m:properties'];
          stmt.run(entry['d:NEW_DATE']['_'], entry['d:BC_1YEAR']['_'], entry['d:BC_5YEAR']['_']);
      }
      stmt.finalize();

    });
    db.close();
}

该程序将下载并解析债券收益率数据中的前100个条目,这很容易更改以添加更多。它为日期,1年期收益率和5年收益率创建了一列。

package.json看起来像这样:

<强>的package.json

{
  "name": "sqllite-bondcurve",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "dependencies": {
    "request": "^2.86.0",
    "sqlite3": "^4.0.0",
    "xml2js": "^0.4.19"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC"
}

这允许您安装依赖项(使用npm install)。

结果数据如下:

Date,BC_1YEAR,BC_5YEAR
1997-01-02T00:00:00,5.630000114440918,6.3000001907348633
1996-12-31T00:00:00,5.5100002288818359,6.2100000381469727
1997-01-03T00:00:00,5.5999999046325684,6.28000020980835
1997-01-07T00:00:00,5.6100001335144043,6.320000171661377
1997-01-06T00:00:00,5.6100001335144043,6.3000001907348633
1996-12-24T00:00:00,5.5100002288818359,6.130000114440918
1996-12-23T00:00:00,5.5199999809265137,6.119999885559082
1996-12-26T00:00:00,5.5,6.130000114440918
1996-12-30T00:00:00,5.46999979019165,6.0999999046325684
1996-12-27T00:00:00,5.46999979019165,6.0900001525878906

您也可以使用C#,例如:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Xml.Linq;

namespace DownloadBondData
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = new WebClient();
            Console.WriteLine("Downloading XMl..");
            var xml = client.DownloadString("http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData");
            Console.WriteLine("Parsing XMl..");
            var xmlDoc = XDocument.Parse(xml);
            var elementList = xmlDoc.Document.Elements().First().Elements().ToList();
        }
    }
}

一旦有了元素列表,就可以解析并写入sqlite db。