已更新/改写!
我首次尝试功能,最终目标是使用耐用的功能再次扇出并重新加载,以加快将PDF转换为单页图像。
目前我正在测试性能和扩展程度。
从多个函数处理一个PDF并不是非常有效,所以我将PDF拆分为单页PDF,然后运行第二个函数,我的代码如下:
namespace PoC
{
public static class SplitPDF
{
static string storageAccountConnectionString = "";
[FunctionName("SplitPDF")]
public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log)
{
log.Info("SplitPDF HTTP trigger function received a request.");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Aspose.Pdf.License pdfLicense = new Aspose.Pdf.License();
pdfLicense.SetLicense("Aspose.Total.lic");
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccountConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("functest");
CloudBlockBlob pdfBlob = container.GetBlockBlobReference("Azure Machine Learning Essentials.pdf");
Aspose.Pdf.Facades.PdfFileEditor pdfEditor = new Aspose.Pdf.Facades.PdfFileEditor();
var index = 1;
using (MemoryStream inStream = new MemoryStream())
{
await pdfBlob.DownloadToStreamAsync(inStream);
while (index < 238) {
using (MemoryStream outStream = new MemoryStream())
{
int[] pagesToExtract = new int[] { index };
pdfEditor.Extract(inStream, pagesToExtract, outStream);
outStream.Seek(0, SeekOrigin.Begin);
CloudBlockBlob outBlob = container.GetBlockBlobReference($"page_pdf/{index}.pdf");
await outBlob.UploadFromStreamAsync(outStream);
index++;
}
}
}
stopwatch.Stop();
log.Info($"SplitPDF Took {stopwatch.Elapsed}.");
return req.CreateResponse(HttpStatusCode.OK);
}
}
public static class GenPageImage
{
static string storageAccountConnectionString = "";
[FunctionName("GenPageImage")]
public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log)
{
log.Info("GenPageImage HTTP trigger function received a request.");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int page = Convert.ToInt32(req.GetQueryNameValuePairs()
.FirstOrDefault(q => string.Compare(q.Key, "page", true) == 0)
.Value);
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccountConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("functest");
CloudBlockBlob pdfBlob = container.GetBlockBlobReference($"page_pdf/{page}.pdf");
CloudBlockBlob imageBlob = container.GetBlockBlobReference($"page_png/{page}.png");
Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(96);
Aspose.Pdf.Devices.PngDevice device = new Aspose.Pdf.Devices.PngDevice(resolution);
using (MemoryStream inStream = new MemoryStream())
{
await pdfBlob.DownloadToStreamAsync(inStream);
var doc = new Aspose.Pdf.Document(inStream);
using (MemoryStream outStream = new MemoryStream())
{
device.Process(doc.Pages[1], outStream);
outStream.Seek(0, SeekOrigin.Begin);
await imageBlob.UploadFromStreamAsync(outStream);
}
}
stopwatch.Stop();
log.Info($"GenPageImage Took {stopwatch.Elapsed}.");
return req.CreateResponse(HttpStatusCode.OK);
}
}
}
我使用的随机Azure PDF为237页,拆分需要18秒,然后运行:
for (( i=1; i<=237; i++)); do wget -b -q "http://url/api/GenPageImage?page=$i"; done
我有App Insights和实时流媒体以及转换237页单页PDF的整个过程需要2分钟,每个应该花费不超过几秒钟。在整个过程中,CPU处于100%但不会扩展。
文档说明消费计划应该根据传入的请求数量进行扩展,是否超过200?
我添加了第三个功能,只需在一个函数中创建所有图像,并且还需要大约2分钟: - (
注意:图像生成可以在没有Aspose许可证的情况下完成,但是提取只会给你前4页,所以如果你想尝试它,那么可能只是复制前几页以产生音量。
添加建议的比例设置后更新:
希望此信息有所帮助。
添加设置后,该功能再次以3个实例启动,我让它们消失直到有0然后发送分割文件请求,冷启动大约是10s以获取信息。
运行1个实例后,我提交了237个请求,我看到的是:
开始发送HTTP请求 - 09:01:23
图像显示在输出文件夹中(基于文件的修改日期):
09:01:57 to 09:02:01 - 60 pngs
09:02:34 to 09:04:38 - 119 pngs
09:04:50 to 09:04:51 - 58 pngs
在20-30s间隙没有内容产生,我看到节点几乎直接从1到3缩放然后有一个长时间的不活动间隙然后第四个节点出现并且似乎开始处理内容而其他节点已经休眠好几秒钟。开始完成整个过程比3分28秒更长。
运行4个实例后,我再次尝试:
开始发送HTTP请求 - 09:09:44
图像显示在输出文件夹中(基于文件的修改日期):
09:09:45 to 09:10:17 - 178 pngs
09:10:55 to 09:11:09 - 59 pngs
这次所有4个实例都很忙,然后有一点间歇,第5个启动并且似乎完成了一些事情,我认为这解释了近40个处理的差距?现在开始结束的时间在1分25秒时更好,但仍然比单个实例更快地完成。
此时我想知道是否应该直接进入Durable函数扇出方法,假设缩放可能更好?