注意:这是关于在 import-xml-files-to-postgresql 中尝试执行建议时遇到的错误的问题
我正在尝试导入一行XML文件,以便测试导入所有行所需的代码,这些代码应该超过600,000。我的XML看起来像这样:
<response>
<row>
<row _id="1" _uuid="7A68A6C8-3E73-4976-A4BD-9995F97A580F" _position="1" _address="https://data.kcmo.org/resource/vrys-qgrz/1">
<objectid>471537</objectid>
<parcelid>2960</parcelid>
<kivapin>100064</kivapin>
<subdivision></subdivision>
<landusecode>1111 - Single Family (Non-Mobile Home Park)</landusecode>
<apn>CL1330600060270001</apn>
<parceltype>Parcels</parceltype>
<status>2 - Existing</status>
<condo>No</condo>
<prefix>N</prefix>
<own_name>Smith John</own_name>
<own_addr>123 Main Street</own_addr>
<own_city>Kansas City</own_city>
<own_zip>64114-1234</own_zip>
<shape_length>410.3620269</shape_length>
<shape_area>9314.662882</shape_area>
<latitude>39.2636</latitude>
<longitude>-94.5698</longitude>
<location_1 human_address="{"address":"123 Main Street","city":"Kansas City","state":"MO","zip":"64114-1234"}" latitude="39.2636" longitude="-94.5698" needs_recoding="false"/>
</row>
</row>
</response>
我将此代码插入数据库表中的代码如下:
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
(xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
(xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
(xpath('//subdivision/text()', myTempTable.myXmlColumn))[1]::text AS subdivision,
(xpath('//block/text()', myTempTable.myXmlColumn))[1]::text AS block,
(xpath('//lot/text()', myTempTable.myXmlColumn))[1]::text AS lot,
(xpath('//datecreated/text()', myTempTable.myXmlColumn))[1]::text AS datecreated,
(xpath('//landusecode/text()', myTempTable.myXmlColumn))[1]::text AS landusecode,
(xpath('//apn/text()', myTempTable.myXmlColumn))[1]::text AS apn,
(xpath('//parceltype/text()', myTempTable.myXmlColumn))[1]::text AS parceltype,
(xpath('//status/text()', myTempTable.myXmlColumn))[1]::text AS status,
(xpath('//condo/text()', myTempTable.myXmlColumn))[1]::text AS condo,
(xpath('//platname/text()', myTempTable.myXmlColumn))[1]::text AS platname,
(xpath('//fraction/text()', myTempTable.myXmlColumn))[1]::text AS fraction,
(xpath('//prefix/text()', myTempTable.myXmlColumn))[1]::text AS prefix,
(xpath('//suite/text()', myTempTable.myXmlColumn))[1]::text AS suite,
(xpath('//own_name/text()', myTempTable.myXmlColumn))[1]::text AS own_name,
(xpath('//own_addr/text()', myTempTable.myXmlColumn))[1]::text AS own_addr,
(xpath('//own_city/text()', myTempTable.myXmlColumn))[1]::text AS own_city,
(xpath('//own_zip/text()', myTempTable.myXmlColumn))[1]::text AS own_zip,
(xpath('//blvdfront/text()', myTempTable.myXmlColumn))[1]::text AS blvdfront,
(xpath('//lastupdate/text()', myTempTable.myXmlColumn))[1]::text AS lastupdate,
(xpath('//shape_length/text()', myTempTable.myXmlColumn))[1]::text AS shape_length,
(xpath('//shape_area/text()', myTempTable.myXmlColumn))[1]::text AS shape_area,
(xpath('//latitude/text()', myTempTable.myXmlColumn))[1]::text AS latitude,
(xpath('//longitude/text()', myTempTable.myXmlColumn))[1]::text AS longitude,
(xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,
myTempTable.myXmlColumn as myXmlElement
FROM unnest(
'//row'
,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8'))
) AS myTempTable(myXmlColumn);
尝试执行此语句会出现此错误:
[2018-03-26 19:42:50] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
(xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
(xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
...
[2018-03-26 19:42:50] [42601] ERROR: syntax error at or near "myTempTable"
[2018-03-26 19:42:50] Position: 2058
[2018-03-26 19:42:50] Summary: 1 of 1 statements executed, 1 failed in 380ms (2293 symbols in file)
我认为这可能是代码体内存在一些语法错误的问题,因此我只运行了第一个xpath
语句,但这会产生错误:
[2018-03-26 19:46:17] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
myTempTable.myXmlColumn as myXmlElement
FROM unnest(
'//row'
,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8...
[2018-03-26 19:46:17] [42804] ERROR: could not determine polymorphic type because input has type "unknown"
[2018-03-26 19:46:17] Summary: 1 of 1 statements executed, 1 failed in 385ms (273 symbols in file)
我不太确定从哪里开始。
答案 0 :(得分:1)
一旦您的表中已有XML文档,就可以使用以下内容解析它:
UPDATE table SET id = DATE_FORMAT(post_date, "%y%m%d%H%m%s");
在处理大量数据时,CTE并不总是我的首选,但它确实使代码更具可读性,在处理数据导入时也值得考虑。
关于将XML文件导入PostgreSQL,我总是使用COPY来解决问题,并使用中间表存储XML文档,然后再删除它。
描述 here :
WITH j AS (SELECT UNNEST(XPATH('//row',myXmlColumn)) AS myXmlColumn
FROM myTempTable)
SELECT
(xpath('//objectid/text()', j.myXmlColumn))[1]::text AS objectid,
(xpath('//parcelid/text()', j.myXmlColumn))[1]::text AS parcelid,
(xpath('//kivapin/text()', j.myXmlColumn))[1]::text AS kivapin,
(xpath('//subdivision/text()', j.myXmlColumn))[1]::text AS subdivision,
(xpath('//block/text()', j.myXmlColumn))[1]::text AS block,
(xpath('//lot/text()', j.myXmlColumn))[1]::text AS lot,
(xpath('//datecreated/text()', j.myXmlColumn))[1]::text AS datecreated,
(xpath('//landusecode/text()', j.myXmlColumn))[1]::text AS landusecode,
(xpath('//apn/text()', j.myXmlColumn))[1]::text AS apn,
(xpath('//parceltype/text()', j.myXmlColumn))[1]::text AS parceltype,
(xpath('//status/text()', j.myXmlColumn))[1]::text AS status,
(xpath('//condo/text()', j.myXmlColumn))[1]::text AS condo,
(xpath('//platname/text()', j.myXmlColumn))[1]::text AS platname,
(xpath('//fraction/text()', j.myXmlColumn))[1]::text AS fraction,
(xpath('//prefix/text()', j.myXmlColumn))[1]::text AS prefix,
(xpath('//suite/text()', j.myXmlColumn))[1]::text AS suite,
(xpath('//own_name/text()', j.myXmlColumn))[1]::text AS own_name,
(xpath('//own_addr/text()', j.myXmlColumn))[1]::text AS own_addr,
(xpath('//own_city/text()', j.myXmlColumn))[1]::text AS own_city,
(xpath('//own_zip/text()', j.myXmlColumn))[1]::text AS own_zip,
(xpath('//blvdfront/text()', j.myXmlColumn))[1]::text AS blvdfront,
(xpath('//lastupdate/text()', j.myXmlColumn))[1]::text AS lastupdate,
(xpath('//shape_length/text()', j.myXmlColumn))[1]::text AS shape_length,
(xpath('//shape_area/text()', j.myXmlColumn))[1]::text AS shape_area,
(xpath('//latitude/text()', j.myXmlColumn))[1]::text AS latitude,
(xpath('//longitude/text()', j.myXmlColumn))[1]::text AS longitude,
(xpath('//location1/text()', j.myXmlColumn))[1]::text AS location1,
j.myXmlColumn as myXmlElement
FROM j
如果PostgreSQL抱怨您的数据有换行符$ psql db -c "CREATE TABLE tmp (doc XML);"
$ cat xmlfile.xml | psql db -c "COPY tmp FROM STDIN"
,您可以使用\n
,sed
或甚至tr
等工具来逃避它们:
perl -pe
顺便说一下:在这个xpath表达式之后,你的查询中缺少一个逗号$ cat xmlfile.xml | perl -pe 's/\n/\\n/g' | psql db -c "COPY tmp FROM STDIN"
:
,
编辑:如果您可以直接将文件放入数据库服务器的文件系统(我们大多数人都没有),您可以继续使用{的组合{1}}和(xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,
通过 UNNEST 但请注意,表达式pg_read_binary_file
会导致未知类型,这在使用参数时可能会非常棘手功能。相反,请使用简单的 XPATH 表达式来完成工作:
convert_from
答案 1 :(得分:0)
t=# select pg_get_function_arguments(oid),oid::regprocedure from pg_proc where proname = 'pg_read_binary_file';
pg_get_function_arguments | oid
-------------------------------+-------------------------------------------------
text, bigint, bigint | pg_read_binary_file(text,bigint,bigint)
text, bigint, bigint, boolean | pg_read_binary_file(text,bigint,bigint,boolean)
text | pg_read_binary_file(text)
(3 rows)
尝试将pg_read_binary_file('parcel_data_first_row.xml'
转换为:
pg_read_binary_file('parcel_data_first_row.xml'::text)