Question

我在ASP.NET中创建了一个HttpModule，允许用户上传大文件。我在网上找到了一些示例代码，我可以根据自己的需要进行调整。我抓取文件，如果它是一个多部分消息，然后我将字节块并将其写入磁盘。

问题是文件总是损坏。经过一些研究后，事实证明，由于某种原因，HTTP头或消息体标签应用于我收到的字节的第一部分。我似乎无法弄清楚如何解析这些字节，所以我只得到文件。

额外的数据/垃圾被预先添加到文件的顶部，例如：

-----------------------8cbb435d6837a3f
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: application/octet-stream

这种头信息当然会破坏我收到的文件，所以在写字节之前我需要摆脱它。

以下是我为处理上传而编写的代码：

public class FileUploadManager : IHttpModule
{
    public int BUFFER_SIZE = 1024;

    protected void app_BeginRequest(object sender, EventArgs e)
    {
        // get the context we are working under
        HttpContext context = ((HttpApplication)sender).Context;

        // make sure this is multi-part data
        if (context.Request.ContentType.IndexOf("multipart/form-data") == -1)
        {
            return;
        }

        IServiceProvider provider = (IServiceProvider)context;
        HttpWorkerRequest wr = 
        (HttpWorkerRequest)provider.GetService(typeof(HttpWorkerRequest));

        // only process this file if it has a body and is not already preloaded
        if (wr.HasEntityBody() && !wr.IsEntireEntityBodyIsPreloaded())
        {
            // get the total length of the body
            int iRequestLength = wr.GetTotalEntityBodyLength();

            // get the initial bytes loaded
            int iReceivedBytes = wr.GetPreloadedEntityBodyLength();

            // open file stream to write bytes to
            using (System.IO.FileStream fs = 
            new System.IO.FileStream(
               @"C:\tempfiles\test.txt", 
               System.IO.FileMode.CreateNew))
            {
                // *** NOTE: This is where I think I need to filter the bytes 
                // received to get rid of the junk data but I am unsure how to 
                // do this?

                int bytesRead = BUFFER_SIZE;
                // Create an input buffer to store the incomming data 
                byte[] byteBuffer = new byte[BUFFER_SIZE];
                while ((iRequestLength - iReceivedBytes) >= bytesRead)
                {
                    // read the next chunk of the file
                    bytesRead = wr.ReadEntityBody(byteBuffer, byteBuffer.Length);
                    fs.Write(byteBuffer, 0, byteBuffer.Length);
                    iReceivedBytes += bytesRead;

                    // write bytes so far of file to disk
                    fs.Flush();
                }
            }
        }
    }
}

如何检测并解析此标头垃圾信息以便仅隔离文件位？

Answer 1

使用InputSteramEntity类如下：

 InputStreamEntity reqEntity = new InputStreamEntity(new FileInputStream(filePath), -1);
 reqEntity.setContentType("binary/octet-stream");
 httppost.setEntity(reqEntity);
 HttpResponse response = httpclient.execute(httppost);

如果您使用上述内容，则不会将标记添加到标题和预告片以及内容处置，服务器上的内容类型

----------------------- 8cbb435d6837a3f 内容处理：表格数据; NAME = “文件”;文件名= “test.txt的” Content-Type：application / octet-stream

----------------------- 8cbb435d6837a3f

Answer 2

您遇到的是用于分隔HTTP请求的各个部分的边界。在名为Content-type的请求开头应该有一个标题，在该标题中，有一个边界语句，如下所示：

Content-Type: multipart/mixed;boundary=gc0p4Jq0M2Yt08jU534c0p

找到此边界后，只需将请求拆分为边界，并在其前面加上两个连字符（ - ）。换句话说，将您的内容拆分为：

"--"+Headers.Get("Content-Type").Split("boundary=")[1]

在那里排序伪代码，但它应该得到重点。这应该将多部分表单数据划分为适当的部分。

有关详细信息，请参阅RFC1341

值得注意的是，显然最后的边界也有两个连字符附加到边界的末端。

编辑：好的，您遇到的问题是您没有将表单数据分解为必要的组件。 multipart / form-data请求的各个部分可以分别被视为单独的请求（意味着它们可以包含标题）。您应该做的是将字节读入字符串：

string formData = Encoding.ASCII.GetString(byteBuffer);

根据边界分成多个字符串：

string boundary = "\r\n"+context.Request.ContentType.Split("boundary=")[1];
string[] parts = Regex.Split( formData, boundary );

遍历每个字符串，将标题与内容分开。因为你实际上想要内容的字节值，所以跟踪数据偏移，因为从ASCII转换回字节可能无法正常工作（我可能错了，但我是偏执狂）：

int dataOffset = 0;
for( int i=0; i < parts.Length; i++ ){
    string header = part.Substring( 0, part.IndexOf( "\r\n\r\n" ) );
    dataOffset += boundary.Length + header.Length + 4;
    string asciiBody = part.Substring( part.IndexOf( "\r\n\r\n" ) + 4 );
    byte[] body = new byte[ asciiBody.Length ];

    for( int j=dataOffset,k=0; j < asciiBody.Length; j++ ){
        body[k++] = byteBuffer[j];
    }

    // body now contains your binary data
}

注意：这是未经测试的，因此可能需要进行一些调整。

如何使用HttpModule删除标题信息以上传文件？

2 个答案: