Java无法读取完整文件

时间:2013-10-16 06:18:32

标签: java file-io

我需要一些问题的帮助。 我试图从文本文件加载我的2000代理列表,但我的类只填充1040个数组索引与每行读取的内容。

我不知道该怎么做。 :(

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class ProxyLoader {

private String[] lineSplit = new String[100000];
private static String[] addresses = new String[100000];
private static int[] ports = new int[100000];
public int i = 0;

public ProxyLoader() {
    readData();
}

public synchronized String getAddr(int i) {
    return this.addresses[i];
}

public synchronized int getPort(int i) {
    return this.ports[i];
}

public synchronized void readData() {
    try {
        BufferedReader br = new BufferedReader(
                new FileReader("./proxy.txt"));
        String line = "";

        try {
            while ((line = br.readLine()) != null) {

                lineSplit = line.split(":");
                i++;

                addresses[i] = lineSplit[0];
                ports[i] = Integer.parseInt(lineSplit[1]);
                System.out.println("Line Number [" + i + "]  Adr: "
                        + addresses[i] + " Port: " + ports[i]);
            }

            for (String s : addresses) {
                if (s == null) {
                    s = "127.0.0.1";
                }
            }

            for (int x : ports) {
                if (x == 0) {
                    x = 8080;
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

}

2 个答案:

答案 0 :(得分:1)

让我们从整理你的代码开始,有很多问题可能会给你带来麻烦。但是,如果没有代理文件的相关部分,我们就无法测试或复制您所看到的行为。考虑创建和发布SSCCE,而不仅仅是代码段。

  1. 正确缩进/格式化您的代码。
  2. 这些方法不需要(不应该)synchronized - 在多线程环境中从数组中读取是安全的,并且永远不应该构建ProxyLoader的多个实例不同的主题,synchronized上的readData()就是浪费。
  3. 创建海量数组是存储这些数据的一种非常糟糕的方式 - 分配那么多额外的内存是浪费的,如果加载的文件恰好大于你设置的常量,你的代码现在会失败。使用可扩展的数据结构,例如ArrayListMap
  4. 将地址和端口存储在单独的数组中,使用一个对象来保存这两个值将节省内存并防止数据不一致。
  5. 您的public int i变量很危险 - 可能是您使用它来表示加载的最大行数,但应该避免使用此代替size()方法 - 作为公共实例变量,使用该类的任何人都可以更改此值,而i是变量的名称不佳,max是更好的选择。
  6. 您可能不希望readData()公开,因为多次调用它会做很奇怪的事情(它会再次加载文件,从i开始,填充数组重复数据)。最好的想法是直接在构造函数中加载数据(或者在构造函数调用的private方法中),这样文件只会为每个创建的ProxyLoader实例加载一次。
  7. 您正在创建一个庞大的空数组lineSplit,然后将其替换为String.split()的结果。这是令人困惑和浪费的,使用局部变量代替保持分割线。
  8. 您在读取文件后没有关闭文件,这可能导致内存泄漏或其他与数据不一致的情况。使用try-with-resources语法有助于简化这一过程。
  9. 在填充它们之后,遍历整个字符串和端口数组,用剩余的基本噪声填充其余的插槽。目前还不清楚你要做的是什么,但我确定这是一个糟糕的计划。
  10. 我建议以下实施:

    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Iterator;
    
    public class ProxyLoader implements Iterable<ProxyLoader.Proxy> {
      // Remove DEFAULT_PROXY if not needed
      private static final Proxy DEFAULT_PROXY = new Proxy("127.0.0.1", 8080);
      private static final String DATA_FILE = "./proxy.txt";
      private ArrayList<Proxy> proxyList = new ArrayList<>();
    
      public ProxyLoader() {
        // Try-with-resources ensures file is closed safely and cleanly
        try(BufferedReader br = new BufferedReader(new FileReader(DATA_FILE))) {
          String line;
          while ((line = br.readLine()) != null) {
            String[] lineSplit = line.split(":");
            Proxy p = new Proxy(lineSplit[0], Integer.parseInt(lineSplit[1]));
            proxyList.add(p);
          }
        } catch (IOException e) {
          System.err.println("Failed to open/read "+DATA_FILE);
          e.printStackTrace(System.err);
        }
      }
    
      // If you request a positive index larger than the size of the file, it will return
      // DEFAULT_PROXY, since that's the behavior your original implementation
      // essentially did.  I'd suggest deleting DEFAULT_PROXY, having this method simply
      // return proxyList.get(i), and letting it fail if you request an invalid index.
      public Proxy getProxy(int i) {
        if(i < proxyList.size()) {
          return proxyList.get(i);
        } else {
          return DEFAULT_PROXY;
        }
      }
    
      // Lets you safely get the maximum index, without exposing the list directly
      public int getSize() {
        return proxyList.size();
      }
    
      // lets you run for(Proxy p : proxyLoader) { ... }
      @Override
      public Iterator<Proxy> iterator() {
        return proxyList.iterator();
      }
    
      // Inner static class just to hold data
      // can be pulled out into its own file if you prefer
      public static class Proxy {
        // note these values are public; since they're final, this is safe.
        // Using getters is more standard, but it adds a lot of boilerplate code
        // somewhat needlessly; for a simple case like this, public final should be fine.
        public final String address;
        public int port;
    
        public Proxy(String a, int p) {
          address = a;
          port = p;
        }
      }
    }
    

答案 1 :(得分:1)

我已经包含了一些可能不完全适合您的用例的示例,但是展示了一些编写代码的方法,这些代码更易于维护和阅读。

难以阅读的代码,难以调试和维护。

  • 对象需要验证其输入(构造函数args)。
  • 拒绝不良数据。调试时更难以快速失败。
  • 除非你能恢复,否则永远不要捕捉异常。要么软化它(包裹 在运行时并重新抛出它),或将其添加到throws子句中。如果 你不知道该怎么做,什么也不做。
  • 永远不要吃异常。重新扔掉或处理它。
  • 您的代码保持不需要的状态。
  • 类比两个gak数组更自我描述。
  • 避免公共场合。除非他们是最终的。
  • 保护对象的状态。
  • 考虑如何调用方法,避免副作用。两次调用readData会导致难以调试的副作用
  • 内存便宜但不免费。不要实例化你不需要的大型数组。
  • 如果你打开它,你必须关闭它。

Java 7和8允许您从FileSystem读取行,因此无需编写大部分代码来开始:

 Path thePath = FileSystems.getDefault().getPath(location);
 return Files.readAllLines(thePath, Charset.forName("UTF-8"));

如果您必须将大量小文件读入行并且不想使用FileSystem,或者您使用的是Java 6或Java 5,那么您将创建一个实用程序类,如下所示:

public class IOUtils {

    public final static String CHARSET = "UTF-8";

...

public static List<String> readLines(File file) {
    try (FileReader reader = new FileReader(file)) {
        return readLines(reader);
    } catch (Exception ex) {
        return Exceptions.handle(List.class, ex);
    }
}

调用带读取器的readLines:

public static List<String> readLines(Reader reader) {

    try (BufferedReader bufferedReader = new BufferedReader(reader)) {
          return readLines(bufferedReader);
    } catch (Exception ex) {
        return Exceptions.handle(List.class, ex);
    }
}

调用带有BufferedReader的readLines:

public static List<String> readLines(BufferedReader reader) {
    List<String> lines = new ArrayList<>(80);

    try (BufferedReader bufferedReader = reader) {


        String line = null;
        while ( (line = bufferedReader.readLine()) != null) {
        lines.add(line);
        }

    } catch (Exception ex) {

        return Exceptions.handle(List.class, ex);
    }
    return lines;
}

Apache有一组名为Apache commons(http://commons.apache.org/)的实用程序。它包括lang,它包括IO utils(http://commons.apache.org/proper/commons-io/)。如果您使用的是Java 5或Java 6,那么这些中的任何一个都会很好。

回到我们的示例,您可以将任何位置转换为行列表:

public static List<String> readLines(String location) {
    URI uri =  URI.create(location);

    try {

        if ( uri.getScheme()==null ) {

            Path thePath = FileSystems.getDefault().getPath(location);
            return Files.readAllLines(thePath, Charset.forName("UTF-8"));

        } else if ( uri.getScheme().equals("file") ) {

            Path thePath = FileSystems.getDefault().getPath(uri.getPath());
            return Files.readAllLines(thePath, Charset.forName("UTF-8"));

        } else {
            return readLines(location, uri);
        }

    } catch (Exception ex) {
         return Exceptions.handle(List.class, ex);
    }

}

FileSystem,Path,URI等都在JDK中。

继续举例:

private static List<String> readLines(String location, URI uri) throws Exception {
    try {

        FileSystem fileSystem = FileSystems.getFileSystem(uri);
        Path fsPath = fileSystem.getPath(location);
        return Files.readAllLines(fsPath, Charset.forName("UTF-8"));

    } catch (ProviderNotFoundException ex) {
         return readLines(uri.toURL().openStream());
    }
}

上面尝试从FileSystem读取uri,如果无法加载它,那么它会通过URL流查找它。 URL,URI,文件,文件系统等都是JDK的一部分。

要将URL流转换为Reader,然后转换为字符串,我们使用:

public static List<String> readLines(InputStream is) {

    try (Reader reader = new InputStreamReader(is, CHARSET)) {

        return readLines(reader);

    } catch (Exception ex) {

        return Exceptions.handle(List.class, ex);
    }
}

:)

现在让我们回到我们的示例(我们现在可以从包括文件在内的任何地方读取行):

public static final class Proxy {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy> loadProxies() {
        List <String> lines = IOUtils.readLines(DATA_FILE);
        List<Proxy> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

请注意,我们没有任何不可变状态。这可以防止错误。它使您的代码更容易调试和支持。

注意我们的IOUtils.readLines读取文件系统中的行。

注意构造函数中的额外工作,以确保没有人初始化具有错误状态的Proxy实例。这些都在JDK对象,模式等中。

如果你想要一个可重复使用的ProxyLoader,它看起来像这样:

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }
   ...

}

public static class Proxy {
    private final String address;
    private final int port;

    ...

    public Proxy(String address, int port) {
        ... 
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}

编码很棒。测试是神圣的!以下是该示例的一些测试。

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }

}

public static class Proxy {
    private final String address;
    private final int port;

    public Proxy(String address, int port) {
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}

这是一个类中的替代方案。 (我在ProxyLoader中没有看到太多意义。)

public static final class Proxy2 {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy2(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy2 createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy2(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy2> loadProxies() {
        List <String> lines = IO.readLines(DATA_FILE);
        List<Proxy2> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

现在我们编写测试(测试和TDD帮助您解决这些问题):

@Test public void proxyTest() {
    List<Proxy> proxyList = ProxyLoader.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

idx等在我自己的helper lib中定义,名为boon。 idx方法的工作方式类似于Python或Ruby切片表示法。

@Test public void proxyTest2() {
    List<Proxy2> proxyList = Proxy2.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

我的输入文件

127.0.0.1:8080
192.55.55.55:9090
127.0.0.2:8080
192.55.55.56:9090
192.55.55.57:9091

那么我的IOUtils(实际上称为IO):

以下是那些关心IO(utils)的人的测试:

package org.boon.utils;

import com.sun.net.httpserver.Headers;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;
import org.junit.Test;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.net.URI;
import java.util.*;
import java.util.regex.Pattern;

import static javax.lang.Integer.parseInt;
import static org.boon.utils.Lists.idx;
import static org.boon.utils.Lists.len;
import static org.boon.utils.Maps.copy;
import static org.boon.utils.Maps.map;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

...

这可以让您了解所涉及的进口。

public class IOTest {

....

这是一个从文件系统上的文件中读取行的测试。

@Test
public void testReadLines() {
    File testDir = new File("src/test/resources");
    File testFile = new File(testDir, "testfile.txt");


    List<String> lines = IO.readLines(testFile);

    assertLines(lines);

}

这是一个断言正确读取文件的辅助方法。

private void assertLines(List<String> lines) {

    assertEquals(
            4, len(lines)
    );


    assertEquals(
            "line 1", idx(lines, 0)
    );



    assertEquals(
            "grapes", idx(lines, 3)
    );
}

这是一个测试,显示从String路径读取文件。

@Test
public void testReadLinesFromPath() {


    List<String> lines = IO.readLines("src/test/resources/testfile.txt");

    assertLines(lines);



}

此测试显示从URI读取文件。

@Test
public void testReadLinesURI() {

    File testDir = new File("src/test/resources");
    File testFile = new File(testDir, "testfile.txt");
    URI uri = testFile.toURI();


    //"file:///....src/test/resources/testfile.txt"
    List<String> lines = IO.readLines(uri.toString());
    assertLines(lines);


}

这是一个测试,显示您可以从HTTP服务器读取文件中的行:

static class MyHandler implements HttpHandler {
    public void handle(HttpExchange t) throws IOException {

        File testDir = new File("src/test/resources");
        File testFile = new File(testDir, "testfile.txt");
        String body = IO.read(testFile);
        t.sendResponseHeaders(200, body.length());
        OutputStream os = t.getResponseBody();
        os.write(body.getBytes(IO.CHARSET));
        os.close();
    }
}

这是HTTP服务器测试(用于解释HTTP服务器)。

@Test
public void testReadFromHttp() throws Exception {

    HttpServer server = HttpServer.create(new InetSocketAddress(9666), 0);
    server.createContext("/test", new MyHandler());
    server.setExecutor(null); // creates a default executor
    server.start();

    Thread.sleep(1000);

    List<String> lines = IO.readLines("http://localhost:9666/test");
    assertLines(lines);

}

以下是代理缓存测试:

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }

}

public static class Proxy {
    private final String address;
    private final int port;

    public Proxy(String address, int port) {
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}


public static final class Proxy2 {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy2(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy2 createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy2(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy2> loadProxies() {
        List <String> lines = IO.readLines(DATA_FILE);
        List<Proxy2> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

@Test public void proxyTest() {
    List<Proxy> proxyList = ProxyLoader.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

这是实际的代理缓存测试:

@Test public void proxyTest2() {
    List<Proxy2> proxyList = Proxy2.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

}

您可以在此处查看此示例的所有源代码和此实用程序类:

https://github.com/RichardHightower/boon

https://github.com/RichardHightower/boon/blob/master/src/main/java/org/boon/utils/IO.java

或者来看我:

http://rick-hightower.blogspot.com/