按顺序处理多线程输出

时间:2015-09-16 15:32:29

标签: python multithreading

我有一些代码可以生成数千张发票作为PDF,然后将它们写入一个zip文件然后通过HttpStreamingResponse(django)抽出来。 PDF生成真的慢,目前是单线程的。

我可以非常快速地为PDF生成源HTML。我想:

  • 生成所有HTML(单线程,数据库无法处理并发查找)
  • 将这些转换为PDF,分为8个帖子
  • 同步处理PDF的输出,以便将其添加到我的zipstream文件中。

我曾经涉足多处理.Pool之前,但我不知道如何正确地做到这一点。这是一些非常近似的代码。

<Style TargetType="local:SignInContentDialog">
    <Setter Property="Foreground" Value="{ThemeResource SystemControlPageTextBaseHighBrush}" />
    <Setter Property="Background" Value="{ThemeResource SystemControlBackgroundChromeMediumLowBrush}" />
    <Setter Property="HorizontalAlignment" Value="Center" />
    <Setter Property="VerticalAlignment" Value="Top" />
    <Setter Property="IsTabStop" Value="False" />
    <Setter Property="MaxHeight" Value="{ThemeResource ContentDialogMaxHeight}" />
    <Setter Property="MinHeight" Value="{ThemeResource ContentDialogMinHeight}" />
    <Setter Property="MaxWidth" Value="{ThemeResource ContentDialogMaxWidth}" />
    <Setter Property="MinWidth" Value="{ThemeResource ContentDialogMinWidth}" />
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="local:SignInContentDialog">
                <Border x:Name="Container">
                    <Grid x:Name="LayoutRoot">                                                     
                        <Grid.RowDefinitions>
                            <RowDefinition Height="Auto" />
                        </Grid.RowDefinitions>

                        <!-- COMMENT OUT THESE FOLLOWING LINES -->

                        <!--<Grid.ColumnDefinitions>
                            <ColumnDefinition Width="Auto" />
                        </Grid.ColumnDefinitions>-->
                        <Border x:Name="BackgroundElement" Background="{TemplateBinding Background}" FlowDirection="{TemplateBinding FlowDirection}" MaxWidth="{TemplateBinding MaxWidth}" MaxHeight="{TemplateBinding MaxHeight}" MinWidth="{TemplateBinding MinWidth}" MinHeight="{TemplateBinding MinHeight}">
                            <Grid x:Name="DialogSpace" VerticalAlignment="Stretch">
                                <Grid.RowDefinitions>
                                    <RowDefinition Height="Auto" />
                                    <RowDefinition Height="*" />
                                    <RowDefinition Height="Auto" />
                                </Grid.RowDefinitions>
                                <ScrollViewer HorizontalScrollBarVisibility="Disabled" VerticalScrollBarVisibility="Disabled" ZoomMode="Disabled" Margin="{ThemeResource ContentDialogContentScrollViewerMargin}" IsTabStop="False">
                                    <Grid>
                                        <Grid.RowDefinitions>
                                            <RowDefinition Height="Auto" />
                                            <RowDefinition Height="Auto" />
                                        </Grid.RowDefinitions>
                                        <ContentControl x:Name="Title" Margin="{ThemeResource ContentDialogTitleMargin}" Content="{TemplateBinding Title}" ContentTemplate="{TemplateBinding TitleTemplate}" FontSize="20" FontFamily="Segoe UI" FontWeight="Normal" Foreground="{TemplateBinding Foreground}" HorizontalAlignment="Left" VerticalAlignment="Top" IsTabStop="False" MaxHeight="{ThemeResource ContentDialogTitleMaxHeight}">
                                            <ContentControl.Template>
                                                <ControlTemplate TargetType="ContentControl">
                                                    <ContentPresenter Content="{TemplateBinding Content}" MaxLines="2" TextWrapping="Wrap" ContentTemplate="{TemplateBinding ContentTemplate}" Margin="{TemplateBinding Padding}" ContentTransitions="{TemplateBinding ContentTransitions}" HorizontalAlignment="{TemplateBinding HorizontalContentAlignment}" VerticalAlignment="{TemplateBinding VerticalContentAlignment}" />
                                                </ControlTemplate>
                                            </ContentControl.Template>
                                        </ContentControl>
                                        <ContentPresenter x:Name="Content" ContentTemplate="{TemplateBinding ContentTemplate}" Content="{TemplateBinding Content}" FontSize="{ThemeResource ControlContentThemeFontSize}" FontFamily="{ThemeResource ContentControlThemeFontFamily}" Margin="{ThemeResource ContentDialogContentMargin}" Foreground="{TemplateBinding Foreground}" Grid.Row="1" TextWrapping="Wrap" />
                                    </Grid>
                                </ScrollViewer>
                                <Grid x:Name="CommandSpace" Grid.Row="1" HorizontalAlignment="Stretch" VerticalAlignment="Bottom">
                                    <Grid.ColumnDefinitions>
                                        <ColumnDefinition />
                                        <ColumnDefinition />
                                    </Grid.ColumnDefinitions>
                                    <Border x:Name="Button1Host" Margin="{ThemeResource ContentDialogButton1HostMargin}" MinWidth="{ThemeResource ContentDialogButtonMinWidth}" MaxWidth="{ThemeResource ContentDialogButtonMaxWidth}" Height="{ThemeResource ContentDialogButtonHeight}" HorizontalAlignment="Stretch" />
                                    <Border x:Name="Button2Host" Margin="{ThemeResource ContentDialogButton2HostMargin}" MinWidth="{ThemeResource ContentDialogButtonMinWidth}" MaxWidth="{ThemeResource ContentDialogButtonMaxWidth}" Height="{ThemeResource ContentDialogButtonHeight}" Grid.Column="1" HorizontalAlignment="Stretch" />
                                </Grid>
                            </Grid>
                        </Border>
                    </Grid>
                </Border>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

如果有一个选项可以在htmls完成之前开始转换HTML,那就更好了,但我需要以线性方式处理def generate_statements(request): htmls = [generate_html(customer) for customer in customers] pdfs = [generate_pdf(html) for html in htmls] # create zip file for pdf in pdfs: zip.writestring(...) # output this to browser def generate_html(customer): # do something that returns a string of HTML def generate_pdf(html): # do something that creates a single pdf 的输出;我不能同时写邮票。

(PS:我意识到其中一些可能听起来像是家庭作业,但是在你认为我是一个懒惰的学生之前请先查看我的网络资料......我是一个懒惰的专业程序员而不是很多)

2 个答案:

答案 0 :(得分:1)

这样更容易。下面的答案让我想起了多处理池映射方法。这是一个异步映射 - 就像内置映射一样。它会加入(也就是说 - 在完成所有异步工作之前它不会返回),你将获得所有项目的有序列表。

htmls = [generate_html(customer) for customer in customers]
print "Htmls is:" , repr(htmls)
print "Starting map"
pdf_pool = Pool(5)
pdfs = pdf_pool.map(generate_pdf, htmls, 8)

print "Done map"
# zip stuff
for pdf in pdfs:
    print(pdf)

答案 1 :(得分:0)

from multiprocessing import Pool
from time import sleep

# html list that has to convert to pdf
jobs = range(0,100000)

def create_pdf(html,zipstream):
    print 'starting job {}'.format(html)
    pdf = convert_html_to_pdf(html) # finished converting, returns location of the pdf
    try untill suceed: #implement something that checks lock on zipstream
        zipstream.write(pdf)
    sleep(2)
    print 'ended job {}'.format(html)
    #do whatever, followed by whatever requirement for this html.


pool = Pool(processes=8)
with ZipFile('pdfs.zip', 'w') as myzip:
    print pool.map(create_pdf,(jobs,myzip)) # jobs is list of job
    #... when done, 
    zipstream.close()