通过逗号拆分CSV无法正常工作

时间:2018-02-10 07:47:40

标签: java split

我的csv包含

6901257 5.010635294 Apartment   Entire home/apt {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}    3   1   Real Bed    strict  TRUE    NYC Beautiful, sunlit brownstone 1-bedroom in the loveliest neighborhood in Brooklyn. Blocks from the promenade and Brooklyn Bridge Park, with their stunning views of Manhattan, and from the great shopping and food. 6/18/2016   t   t       3/26/2012   f   7/18/2016   40.69652363 -73.99161685    Beautiful brownstone 1-bedroom  Brooklyn Heights    2   100 https://a0.muscache.com/im/pictures/6d7cbbf7-c034-459c-bc82-6522c957627c.jpg?aki_policy=small   11201   1   1

当我尝试通过BufferReader读取时,我得到了这个:

6901257,5.010635294096256,Apartment,Entire home/apt,"{""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Essentials,""Hair dryer"",Iron,""translation missing: en.hosting_amenity_50""}",3,1.0,Real Bed,strict,True,NYC,"Beautiful, sunlit brownstone 1-bedroom in the loveliest neighborhood in Brooklyn. Blocks from the promenade and Brooklyn Bridge Park, with their stunning views of Manhattan, and from the great shopping and food.",2016-06-18,t,t,,2012-03-26,f,2016-07-18,40.696523629970756,-73.99161684624262,Beautiful brownstone 1-bedroom,Brooklyn Heights,2,100.0,https://a0.muscache.com/im/pictures/6d7cbbf7-c034-459c-bc82-6522c957627c.jpg?aki_policy=small,11201,1.0,1.0

我想用逗号分隔它,但问题是当它转到这一行

"{""Wireless Internet"",""Air conditioning"",Kitchen,Heating,""Family/kid friendly"",Essentials,""Hair dryer"",Iron,""translation missing: en.hosting_amenity_50""}"

它甚至用逗号分割这行,我不想要。有办法克服这个问题吗?

        String line;
        fileWriter = new FileWriter("C:\\Users\\nagesingh\\IdeaProjects\\machineLearning\\src\\main\\resources\\train_new.csv");
        while ((line = trainCsv.readLine()) != null) {
            String[] tokens = line.split(",");
            for (int i = 0; i < tokens.length; i++) {
                try {
                    fileWriter.append(Double.valueOf(tokens[i]).toString());
                }catch (Exception e){
                    fileWriter.append("0");
                }
                fileWriter.append(COMMA_DELIMITER);
            }
            fileWriter.append(NEW_LINE_SEPARATOR);
        }

2 个答案:

答案 0 :(得分:0)

只是查看您的数据我坚信您应该,并且我会将所有这些属性作为您的csv中的单独列。

你有什么理由想要这种格式吗?我能做的唯一逻辑演绎是你想要一个对象吗?如果是这样,那么您可以在从文件中读取之后将所有这些属性放入Object中。

但是如果你真的想保留目前的格式。 您可以在阅读时将csv管道(|)分隔并通过管道(|)拆分吗? 这将给你所有这些:“{”“无线互联网”“,”“空调”“,厨房,暖气,”“家庭/儿童友好”“,精华,”“吹风机”“,铁,”“翻译缺失:en.hosting_amenity_50“”}“,作为数组中的单个条目。

答案 1 :(得分:0)

我使用了apache commons CSVParser依赖,得到了我的期望。 这个很容易使用而不是编写代码。

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-csv -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.1</version>
</dependency>

        CSVParser parser =  new CSVParser(trainCsv, CSVFormat.EXCEL);
        Iterable<CSVRecord> csvRecords = parser.getRecords();
        for (CSVRecord csvRecord : csvRecords) {

            for (int i = 0; i < csvRecord.size(); i++) {
                try {
                    fileWriter.append(Double.valueOf(String.valueOf(csvRecord.get(i))).toString());
                }catch (Exception e){
                    fileWriter.append("0");
                }
                fileWriter.append(COMMA_DELIMITER);
            }
            fileWriter.append(NEW_LINE_SEPARATOR);
        }