Python:Unicode到html实体

时间:2015-04-26 19:18:44

标签: python unicode encoding character-encoding

我在将unicode转换为html实体时遇到了问题。

这是我目前的代码:

int resultant_rectangle_width = ((int)(original_image->width/3));
int resultant_rectangle_height = (int)original_image->height;
cvSetImageROI(original_image, cvRect(x, y,resultant_rectangle_width,resultant_rectangle_height));

//this cout block was to check if the ROI was set on original_image,but it showed the original_image properties here.
cout<<"original_image width after setting roi:"<<original_image->width<<endl;
cout<<"original_image height after setting roi:"<<original_image->height<<endl;

//copying the ROI to another image
IplImage *rectangle1 = cvCreateImage(cvGetSize(original_image),original_image->depth,original_image->nChannels);
    cvCopy(original_image, rectangle1);//using three arguments also did not help
    cvResetImageROI(original_image);
    cvShowImage("rectangle1", rectangle1);

每个>> name = u'\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1' >> entities = name.encode('ascii', 'xmlcharrefreplace') >> print str(entities) &#195;&#161;&#195;&#161;&#195;&#161;&#195;&#161; = \xc3\xa1(多字节字符),但当我将其转换为实体时,我会为单个字符获得2个实体。

1 个答案:

答案 0 :(得分:6)

\xc3\xa1在UTF-8中为á,在Unicode中为而非

(Unicode中的ááááu'\xe1\xe1\xe1\xe1'

因此,您需要使用字符串文字来定义它,而不是unicode文字('' vs u'')。获得UTF-8之后,需要将其解码为Unicode,而将其再次编码为带有XML实体的ASCII:

>>> name = '\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'.decode('utf-8')
>>> name.encode('ascii', 'xmlcharrefreplace')
'&#225;&#225;&#225;&#225;'