Question

我正在尝试在AIR中执行以下操作：

浏览文本文件
读取文本文件并将其存储在字符串中（最终以数组形式存储）
用分隔符\ n分割字符串，并将结果字符串放在数组
在将数据发送到网站（mysql数据库）之前操纵该数据

我正在处理的文本文件大小为100-500mb。到目前为止，我已经能够完成第1步和第2步，这是我的代码：

<mx:Script>
    <![CDATA[
    import mx.collections.ArrayCollection;
    import flash.filesystem.*;
    import flash.events.*;
    import mx.controls.*;

    private var fileOpened:File = File.desktopDirectory;
    private var fileContents:String;
    private var stream:FileStream;

    private function selectFile(root:File):void {
        var filter:FileFilter = new FileFilter("Text", "*.txt");
        root.browseForOpen("Open", [filter]);
        root.addEventListener(Event.SELECT, fileSelected);
    }

    private function fileSelected(e:Event):void {
        var path:String = fileOpened.nativePath;
        filePath.text = path;

        stream = new FileStream();
        stream.addEventListener(ProgressEvent.PROGRESS, fileProgress);
        stream.addEventListener(Event.COMPLETE, fileComplete);
        stream.openAsync(fileOpened, FileMode.READ);
    }

    private function fileProgress(p_evt:ProgressEvent):void {
        fileContents += stream.readMultiByte(stream.bytesAvailable, File.systemCharset); 
        readProgress.text = ((p_evt.bytesLoaded/1048576).toFixed(2)) + "MB out of " + ((p_evt.bytesTotal/1048576).toFixed(2)) + "MB read";
    }

    private function fileComplete(p_evt:Event):void {
        stream.close();
        //fileText.text = fileContents;
    }

    private function process(c:String):void {
        if(!c.length > 0) {
            Alert.show("File contents empty!", "Error");
        }
        //var array:Array = c.split(/\n/);

    }

    ]]>
</mx:Script>

这是MXML

<mx:Text x="10" y="10" id="filePath" text="Select a file..." width="678" height="22" color="#FFFFFF"  fontWeight="bold"/>
<mx:Button x="10" y="40" label="Browse" click="selectFile(fileOpened)" color="#FFFFFF" fontWeight="bold" fillAlphas="[1.0, 1.0]" fillColors="[#E2E2E2, #484848]"/>
<mx:Button x="86" y="40" label="Process" click="process(fileContents)" color="#FFFFFF" fontWeight="bold"  fillAlphas="[1.0, 1.0]" fillColors="[#E2E2E2, #484848]"/>
<mx:TextArea x="10" y="70" id="fileText" width="678" height="333" editable="false"/>
<mx:Label x="10" y="411" id="readProgress" text="" width="678" height="19" color="#FFFFFF"/>

fileText.text = fileContents;尝试将字符串的内容放在textarea中 var array：Array = c.split（/ \ n /）;尝试通过分隔符换行符分割字符串

此时可以使用一些输入...... 我还能以正确的方式解决这个问题吗？可以灵活/空中处理这么大的文件吗？（我假设）这是我第一次尝试进行任何类型的弹性工作，如果你看到其他事情我做错了或者可以做得更好，我会欣赏这些抬头！

谢谢！

Answer 1

对500MB文件执行split可能不是一个好主意。您可以编写自己的解析器来处理文件，但它可能也不会很快：

private function fileComplete(p_evt:Event):void 
{
    var array:Array = [];

    var char:String;
    var line:String = "";
    while(stream.position < stream.bytesAvailable)
    {
        char = stream.readUTFBytes(1);
        if(char == "\n")
        {
            array.push(line);
            line = "";
        }
        else
        {
            line += char;
        }
    }

    // catch the last line if the file isn't terminated by a \n
    if(line != "")
    {
        array.push(line);
    }

    stream.close();
}

我还没有对它进行过测试，但它应该逐字逐句地逐步执行。如果该字符是新行，则将旧行推入数组，否则将其添加到当前行。

如果您不希望它在您执行此操作时阻止您的UI，则需要将其抽象为基于计时器的想法：

// pseudo code
private function fileComplete(p_evt:Event):void 
{
    var array:Array = [];
    processFileChunk();
}

private function processFileChunk(event:TimerEvent=null):void
{
    var MAX_PER_FRAME:int = 1024;
    var bytesThisFrame:int = 0;
    var char:String;
    var line:String = "";
    while(   (stream.position < stream.bytesAvailable)
          && (bytesThisFrame < MAX_PER_FRAME))
    {
        char = stream.readUTFBytes(1);
        if(char == "\n")
        {
            array.push(line);
            line = "";
        }
        else
        {
            line += char;
        }
        bytesThisFrame++;
    }

    // if we aren't done
    if(stream.position < stream.bytesAvailable)
    {
        // declare this in the class
        timer = new Timer(100, 1);
        timer.addEventListener(TimerEvent.TIMER_COMPLETE, processFileChunk);
        timer.start();
    }
    // we're done
    else
    {
        // catch the last line if the file isn't terminated by a \n
        if(line != "")
        {
            array.push(line);
        }

        stream.close();

        // maybe dispatchEvent(new Event(Event.COMPLETE)); here
        // or call an internal function to deal with the complete array
    }
}

基本上，您选择处理每个帧的文件数量（MAX_PER_FRAME），然后处理那么多字节。如果你超过字节数，那么只需制作一个定时器，在几帧时间内再次调用过程函数，它应该在它停止的地方继续。一旦确定完成，您就可以调度另一个函数。

Answer 2

我同意。

尝试在从流中读取文本时将文本拆分为块。

这样您就不必将文本存储在fileContents字符串中（将内存使用量减少50％）

Answer 3

尝试分批处理。

Answer 4

关于James的homespun解析器，如果文本文件包含任何多字节UTF字符，则会出现问题（当我遇到此线程时，我试图以类似的方式解析UTF文件）。将每个字节转换为单个字符串将分解多字节字符，因此我做了一些修改。

为了使此解析器具有多字节友好性，您可以将增长的行存储在ByteArray而不是字符串中。然后，当您点击一行（或一个块或文件）的末尾时，您可以将其解析为UTF字符串（如果需要），而不会出现任何问题：

var 
    out :ByteArray,
    line_out :String,
    line_end :Number,
    char :int,
    line:ByteArray;

out = new ByteArray();
line = new ByteArray();

while( file_stream.bytesAvailable > 0 )
{
    char = file_stream.readByte();
    if( (String.fromCharCode( char ) == "\n") )
    {
        // Do some processing on a line-by-line basis
        line_out = ProcessLine( line );
        line_out += "\n";
        out.writeUTFBytes( line_out );
        line = new ByteArray();
    }
    else
    {
        line.writeByte( char );
    }
}
//Get the last line in there
out.writeBytes( line );

Answer 5

stream.position＆lt; stream.bytesAvailable 位置到达文件中间后，这种情况不会是假的吗？如果文件是10个字节，则在读取5个字节后，bytesAvailable将为5，我将初始值存储在另一个变量中并在条件中使用它。除此之外，我认为这是相当不错的

使用Adobe AIR解析大型文本文件

5 个答案: