使用Python中的ReportLab创建PDF时出错

时间:2013-04-11 14:52:26

标签: python reportlab

数据:

['<p>Work! please work.img:0\xc3\x82\xc2\xa0Will you?img:1</p>img:2img:3\xc3\x82\xc2\xa0ascasdacasdadasdaca HAHAHAHAHA! BAND!\n', '\n', "<p>Random test.</p><p><br />If you want to start a flame war, mention lines of code per day or hour in a developer\xc3\xa2&euro;&trade;s public forum. At least that is what I found when I started investigating how many lines of code are written per day per programmer. Lines of code, or loc for short, are supposedly a terrible metric for measuring programmer productivity and empirically I agree with this. There are too many variables involved starting with the definition of a line of code and going all the way up to the complexity of the requirements. There are single lines that take a long time to get right and there many lines which are mindless boilerplate code. All the same this measurement does have information encoded in it; the hard part is extracting that information and drawing the correct conclusions. Unfortunately I don\xc3\xa2&euro;&trade;t have access to enough data about software projects to provide a statistically sound analysis but I got a very interesting result from measuring two very different projects that I would like to share.</p><p>The first project is a traditional client server data mining tool for a vertical market mostly built in VB.NET and WinForms. This project started in 2003 and has been through several releases and an upgrade from .NET 1.1 to .NET 2.0. It has server components but most of the half a million lines of code lives in the client side. The team has always had around four developers although not always the same people. The average lines of code for this project came in at around ninety lines of code per day per developer. I wasn\xc3\xa2&euro;&trade;t able to measure the SQL in the stored procedures so this number is slightly inflated.</p><p><em>The second project is much smaller adding up to ten thousand lines of C# plus seven thousand lines of XAML c</em>reated by a team of four that also worked on the first project. This project lasted three months and it is a WPF point of sale application thus very different in scope from the first project. <strong>It was built around a number of web services in SOA fashion and does not have a database per se. Its average came up around seventy lines of code per developer per day.</strong></p><p>I am very surprised with the closeness of these numbers, especially given the difference in size and scope of the products. The commonality between them are the .NET framework and the team and one of them may be the key. Of these two, I am leaning to the .NET framework being the unifier because although the developers worked on both projects, three of elements on the team of the second project have spent less than a year on the first project and did not belong to the core team that wrote the vast majority of that first product. Or maybe there is something more general at work here?</p><p>The first step in using the WP_Filesystem is requesting credentials from the user. The normal way this is accomplished is at the time when you're saving the results of a form input, or you have otherwise determined that you need to write to a file.</p><p>The credentials form can be displayed onto an admin page by using the following code:</p><pre>$url = wp_nonce_url('themes.php?page=example','example-theme-options');\n</pre>", "if (false === ($creds = request_filesystem_credentials($url, '', false, false, null) ) ) {\n", '\treturn; // stop processing here\n', '}\n', '<p>The request_filesystem_credentials() call takes five arguments.</p><ul><li>The URL to which the form should be submitted (a nonced URL to a theme page was used in the example above)</li><li>A method override (normally you should leave this as the empty string: "")</li><li>An error flag (normally false unless an error is detected, see below)</li><li>A context directory (false, or a specific directory path that you want to test for access)</li><li>Form fields (an array of form field names from your previous form that you wish to "pass-through" the resulting credentials form, or null if there are none)</li></ul><p>The request_filesystem_credentials call will test to see if it is capable of writing to the local filesystem directly without credentials first. If this is the case, then it will return true and not do anything. Your code can then proceed to use the WP_Filesystem class.</p><p>The request_filesystem_credentials call also takes into account hardcoded information, such as hostname or username or password, which has been inserted into the wp-config.php file using defines. If these are pre-defined in that file, then this call will return that information instead of displaying a form, bypassing the form for the user.</p><p>If it does need credentials from the user, then it will output the FTP information form and return false. In this case, you should stop processing further, in order to allow the user to input credentials. Any form fields names you specified will be included in the resulting form as hidden inputs, and will be returned when the user resubmits the form, this time with FTP credentials.</p><p>Note: Do not use the reserved names of hostname, username, password, public_key, or private_key for your own inputs. These are used by the credentials form itself. Alternatively, if you do use them, the request_filesystem_credentials function will assume that they are the incoming FTP credentials.</p><p>When the credentials form is submitted, it will look in the incoming POST data for these fields, and if found, it will return them in an array suitable for passing to WP_Filesystem, which is the next step.</p><p><a id="Initializing_WP_Filesystem_Base" name="Initializing_WP_Filesystem_Base"></a>']

我使用ReportLab将其转换为pdf但它失败了。 这是我的ReportLab代码:

for page in self.pagelist:
            self.image_parser(page)
            print page.content
            for i in range(0,len(page.content)):
                bogustext = page.content[i]
                while (len(re.findall(r'img:?',bogustext)) > 0):
                    for m in re.finditer( r'img:?', bogustext ):
                        image_tag = bogustext[m.start():m.end()+1]
                        print (image_tag.split(':')[1])
                        im = Image(page.images[int(image_tag.split(':')[1])],width=2*inch, height=2*inch)
                        Story.append(Paragraph(bogustext[0:m.start()], style))
                        bogustext = bogustext.replace(bogustext[0:m.start()],'') 
                        Story.append(im)
                        bogustext = bogustext.replace(image_tag,'')
                        break
                p = Paragraph(bogustext,style)
                Story.append(p)
                Story.append(Spacer(1,0.2*inch))

页面是其中page.content包含上面提到的数据的类。

self.image(page)是一个删除page.content(Data)中所有图片网址的功能。

错误:

xml parser error (invalid attribute name id) in paragraph beginning
'<p>The request_filesystem_cred'

如果我为列表的每个元素生成一个PDF,但是如果我尝试从中生成完整的PDF,那么我会得到一个错误。我哪里错了?

0 个答案:

没有答案