Question

我在将unicode转换为html实体时遇到了问题。

这是我目前的代码：

int resultant_rectangle_width = ((int)(original_image->width/3));
int resultant_rectangle_height = (int)original_image->height;
cvSetImageROI(original_image, cvRect(x, y,resultant_rectangle_width,resultant_rectangle_height));

//this cout block was to check if the ROI was set on original_image,but it showed the original_image properties here.
cout<<"original_image width after setting roi:"<<original_image->width<<endl;
cout<<"original_image height after setting roi:"<<original_image->height<<endl;

//copying the ROI to another image
IplImage *rectangle1 = cvCreateImage(cvGetSize(original_image),original_image->depth,original_image->nChannels);
    cvCopy(original_image, rectangle1);//using three arguments also did not help
    cvResetImageROI(original_image);
    cvShowImage("rectangle1", rectangle1);

每个>> name = u'\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1' >> entities = name.encode('ascii', 'xmlcharrefreplace') >> print str(entities) Ã¡Ã¡Ã¡Ã¡ = \xc3\xa1（多字节字符），但当我将其转换为实体时，我会为单个字符获得2个实体。

Answer 1

\xc3\xa1在UTF-8中为á，在Unicode中为而非。

（Unicode中的áááá为u'\xe1\xe1\xe1\xe1'）

因此，您需要使用字符串文字来定义它，而不是unicode文字（'' vs u''）。获得UTF-8之后，需要将其解码为Unicode，而将其再次编码为带有XML实体的ASCII：

>>> name = '\xc3\xa1\xc3\xa1\xc3\xa1\xc3\xa1'.decode('utf-8')
>>> name.encode('ascii', 'xmlcharrefreplace')
'&#225;&#225;&#225;&#225;'

Python：Unicode到html实体

1 个答案: