我在将unicode转换为html实体时遇到了问题。
这是我目前的代码:
int resultant_rectangle_width = ((int)(original_image->width/3));
int resultant_rectangle_height = (int)original_image->height;
cvSetImageROI(original_image, cvRect(x, y,resultant_rectangle_width,resultant_rectangle_height));
//this cout block was to check if the ROI was set on original_image,but it showed the original_image properties here.
cout<<"original_image width after setting roi:"<<original_image->width<<endl;
cout<<"original_image height after setting roi:"<<original_image->height<<endl;
//copying the ROI to another image
IplImage *rectangle1 = cvCreateImage(cvGetSize(original_image),original_image->depth,original_image->nChannels);
cvCopy(original_image, rectangle1);//using three arguments also did not help
cvResetImageROI(original_image);
cvShowImage("rectangle1", rectangle1);
每个>> name = u'\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'
>> entities = name.encode('ascii', 'xmlcharrefreplace')
>> print str(entities)
áááá
= \xc3\xa1
(多字节字符),但当我将其转换为实体时,我会为单个字符获得2个实体。
答案 0 :(得分:6)
\xc3\xa1
在UTF-8中为á
,在Unicode中为而非。
(Unicode中的áááá
为u'\xe1\xe1\xe1\xe1'
)
因此,您需要使用字符串文字来定义它,而不是unicode文字(''
vs u''
)。获得UTF-8之后,需要将其解码为Unicode,而将其再次编码为带有XML实体的ASCII:
>>> name = '\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'.decode('utf-8')
>>> name.encode('ascii', 'xmlcharrefreplace')
'áááá'