使用java Scanner类中的分隔符读取逗号分隔文件

时间:2014-07-26 20:28:35

标签: java io java.util.scanner readline

我正在尝试使用逗号作为分隔符在文件中分配值。问题出现在第一行之后,因为第一行末尾没有逗号,所以Scanner正在读取第一行的最后一个对象,第二行的第一个对象作为一个单独的对象。如何告诉扫描仪只读线? 我正在阅读的文件的链接是:ftp://webftp.vancouver.ca/OpenData/csv/schools.csv

String schoolURL = ("ftp://webftp.vancouver.ca/OpenData/csv/schools.csv");

URL url = new URL(schoolURL);

Scanner sc2 = new Scanner(url.openStream()).useDelimiter(",");

//The file I am trying to read has a header line as the first line, hence the sc2.nextLine() being at the top of the for loop.//

for(int i=0; sc2.hasNextLine(); i++) {

        sc2.nextLine();
        String name, add, website;
        double lat, longi;
        name = sc2.next();
        lat=Double.parseDouble(sc2.next());
        longi=Double.parseDouble(sc2.next());
        add=sc2.next();
        website=sc2.next();
        schools[i] = new School(name, lat, longi, add, website);
 }

2 个答案:

答案 0 :(得分:0)

替代方法是使用BufferedReader

如说 @Yannis Rizos 首先读取行然后拆分它:

Java 7

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import static java.lang.Double.parseDouble;

public class App {

    private static final String SOURCE_URL = "ftp://webftp.vancouver.ca/OpenData/csv/schools.csv";

    private static final int SCHOOL    = 0;
    private static final int LATITUDE  = 1;
    private static final int LONGITUDE = 2;
    private static final int ADDRESS   = 3;
    private static final int WEBSITE   = 4;

    public static void main(String[] args) {
        boolean isHeader = true;
        List<School> schools = new ArrayList<>();

        try (BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(SOURCE_URL).openStream()))) {
            for (String line; (line = reader.readLine()) != null; ) {
                if (isHeader) {
                    isHeader = false;
                }
                else {
                    String[] snippets = line.split(",");

                    // Class a School have next constructor signature
                    // public School(String name, double latitude, double longitude, String address, String webSite)

                    schools.add(new School(
                            snippets[SCHOOL],
                            parseDouble(snippets[LATITUDE]),
                            parseDouble(snippets[LONGITUDE]),
                            snippets[ADDRESS],
                            snippets[WEBSITE]
                    ));
                }
            }
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Java 8

List<School> schools = Files.lines(Paths.get(SOURCE_URL))
                .skip(1)        // skip header
                .map(line -> line.split(","))
                .map(snippets -> new School(
                        snippets[SCHOOL],
                        parseDouble(snippets[LATITUDE]),
                        parseDouble(snippets[LONGITUDE]),
                        snippets[ADDRESS],
                        snippets[WEBSITE]
                ))
                .collect(Collectors.toList());

结果,您将收集113所学校。

答案 1 :(得分:0)

当它应该是基于java.util.Scanner的实现时,除了逗号以外,还应使其接受行尾作为另一个定界符。

如果我正确理解了Pattern定义,则Scanner的实例化应为:

Scanner sc2 = new Scanner( url.openStream() ).useDelimiter( ",|\\R" );

\ R 代表

  

换行符:任何Unicode换行符序列,等效于   \ u000D \ u000A | [\ u000A \ u000B \ u000C \ u000D \ u0085 \ u2028 \ u2029]

有关详细信息,请参考java.util.regex.Pattern的文档。