Question

我有一个对象列表，这些对象是HTML页面中的报废内容，有些值是HTML代码，其中包含引号和单引号( like <a href="http://.."> )等特殊字符，需要在准备CSV字符串时进行转义但插入的值会被转义，因此HTML代码已损坏。

我的代码：

import csv
import subprocess

insert_process = subprocess.Popen([
        'psql', 'dbname', '-U', 'user',
        '-c', '\COPY products(name,description,image) FROM STDIN',
        '--set=ON_ERROR_STOP=true'
        ], stdin=subprocess.PIPE


/****
Looping on the object and convert them to CSV raw with 'serialize_post_to_out_stream' function and append them to the stdin
*****/



# function to remove the new line and tab chars
def clean_vars(var):
    return var.replace('\t', '').replace('\r', '').replace('\n', '')

# function to convert the product object to csv raw
def serialize_post_to_out_stream(product, out):
    writer = csv.writer(out, delimiter="\t",quotechar='\'', quoting=csv.QUOTE_MINIMAL)
    writer.writerow([product.name,clean_vars(product.description),product.image])

# Closing the process to execute the COPY 
insert_process.stdin.close()

一切正常，但'描述'是破坏的HTML代码，如：

'<li id="description" class="level-1">(DX-format): 5°20''</li>      </li>'

有一些不受欢迎的字符，如值的开头和结尾的单引号，如果我删除了：

quotechar='\''

然后每个报价都会被另一个报价转义。如何为CSV转义它们，但在DB中插入它们。

准备PostgreSQL COPY的数据

0 个答案: