我有xml文件的目录。每个XML文件看起来都与此类似:
<songs>
<song>
<title>Some Title</title>
<artist>Artist</artist>
</song>
<song>
<title>Other Title</title>
<artist>Artist 1</artist>
</song>
<song>
<title>Some Title</title>
<artist>Artist/artist>
</song>
</songs>
我的Postgresql表有三列(标题,艺术家,插入日期)。我使用RecursiveDirectoryIterator迭代所有xml文件。
问题:
实现批量Upsert的最佳方法是什么(如果不存在,则插入将插入并更新插入的日期(如果存在。不同的是艺术家和标题对)?
有没有办法通过以下方式将XML直接传递给Postgresql:
simplexml_load_file
或者我可以将其作为字符串:
file_get_contents
然后在Postgresql中将其解析回XML。
我已经尝试过从XML插入行,但是如果让100ZML文件分别包含100首歌曲,这将是非常低效的。所以我试图在文件级别上讨论批量Upsert的方式,但我已经卡住了..
任何帮助都将受到高度赞赏......
编辑:1 我想完成此任务:
INSERT INTO Songs(Artist, Title, DateOfInsertion)
SELECT x.Artist,
,x.Title
FROM xpath('//song', xml('<songs><song><title>Some Title</title><artist>Artist</artist></song><song><title>Other Title</title><artist>Artist 1</artist></song><song><title>Some Title</title><artist>Artist/artist></song></songs>'))) x;
EXCEPTION WHEN unique_violation THEN
UPDATE Songs
SET DateAdded = (now() at time zone 'utc')
WHERE (?????)
编辑:2 这就是我想出的:
CREATE OR REPLACE FUNCTION insertSongs(xml TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE Songs
SET DateAdded = (now() at time zone 'utc')
FROM unnest(xpath('//song', xml(xml))) x
WHERE Artist = extract_value('//artist', x)
AND Title = extract_value('//title', x);
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO Songs(Artist, Title)
SELECT extract_value('//artist', x)
,extract_value('//title', x)
FROM unnest(xpath('//song', xml(xml))) x;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
select insertSongs('<songs><song><artist>A</artist><title>A</title></song><song><artist>B</artist><title>B</title></song></songs>')
Extract_Value函数如下所示:
CREATE OR REPLACE FUNCTION extract_value(
VARCHAR,
XML
) RETURNS TEXT AS
$$
SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
THEN (xpath($1, $2))[1]
WHEN $1 ~* '/text()$'
THEN (xpath($1, $2))[1]
WHEN $1 LIKE '%/'
THEN (xpath($1 || 'text()', $2))[1]
ELSE (xpath($1 || '/text()', $2))[1]
END::text;
$$ LANGUAGE 'sql' IMMUTABLE;
但是现在只有一个语句正在执行。(例如)如果我有歌曲的记录,艺术家A和标题A已经在数据库中但没有艺术家B和标题B,则程序仅更新AA插入日期和跳过B B的插入。如果我不同时插入它们,则插入它们。