ColdFusion

时间:2015-06-08 14:12:30

标签: regex csv coldfusion coldfusion-11 csvtoarray

我正在使用this帖子将CSV文件转换为数组。一切都很好。但我得到一个文件,其中包含字段值中的额外引号,如:

"bash: "shortcuts" are"

"bash: \"shortcuts\" are"

所以我尝试将这些引号替换为:

<cffile action="read" file="#filePath#" variable="csvContent">
<cfset csvContent = reReplace(csvContent, '(?:[^,\r\n])"(?:[^,\r\n])', '&quot;', 'ALL')>

<--- Then do the conversion --->
<cfset array = csvToArray(csv = csvContent)>

但非捕获组无效。我做错了什么?

还有其他办法吗?

  

编辑1:

我也尝试使用cfhttp并遇到以下错误:

<cfhttp name="csvToQuery" method="get" url="#url#" />
  

详细信息:验证列中指定的列数   属性和目标文件

     

消息:行中的列数不正确。

     

StackTrace :   coldfusion.tagext.net.HttpTag $ InvalidColumnsException:不正确   行中的列数。在   coldfusion.tagext.net.HttpTag.connHelper(HttpTag.java:1149)at at   coldfusion.tagext.net.HttpTag.doEndTag(HttpTag.java:1219)at at   cfmfhttp2ecfm308364137.runPage(C:\ inetpub \ wwwroot \ mfhttp.cfm:1)at   coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:244)at at   coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:446)at   coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)at   coldfusion.filter.IpFilter.invoke(IpFilter.java:64)at   coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:430)   在   coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)   at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)   在coldfusion.filter.PathFilter.invoke(PathFilter.java:112)at   coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:30)at   coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:94)at   coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)   在coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)at   coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:58)at   coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)at at   coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)at at   coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)at   coldfusion.CfmServlet.service(CfmServlet.java:219)at   coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)   在   org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)   在   org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)   在   coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)   在   coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)   在   org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)   在   org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)   在   org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)   在   org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)   在   org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)   在   org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)   在   org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)   在   org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)   在   org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)   在   org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)   在org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)   在   org.apache.coyote.AbstractProtocol $ AbstractConnectionHandler.process(AbstractProtocol.java:607)   在   org.apache.tomcat.util.net.JIoEndpoint $ SocketProcessor.run(JIoEndpoint.java:314)   在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)   在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:615)   在   org.apache.tomcat.util.threads.TaskThread $ WrappingRunnable.run(TaskThread.java:61)   在java.lang.Thread.run(Thread.java:722)

1 个答案:

答案 0 :(得分:2)

哦,你不能轻易地自己修复这种输入。正则表达式将进一步破坏您的数据。

你能用Java创建一个小脚本来处理吗?如果您这样做,请使用uniVocity-parsers读取您的CSV输入并使用正确的报价转义将其写回:

这是唯一可以处理损坏的引用转义的CSV解析器。试试这个例子:

import com.univocity.parsers.csv;

import java.io.*;
import java.util.*;

public class Test {

    public static void main(String ... args){
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\r\n");
        settings.setParseUnescapedQuotes(true); // THIS IS IMPORTANT FOR YOU
        CsvParser parser = new CsvParser(settings);

        String line1 = "something,\"a quoted value \"with unescaped quotes\" can be parsed\", something\r\n";
        System.out.println("Input line: " + line1);

        String line2 = "\"after the newline \r\n you will find \" more stuff\r\n";
        System.out.println("Input line: " + line2);

        List<String[]> allInputLines = parser.parseAll(new StringReader(line1 + line2));

        System.out.println("===============\nParsed input values\n===============");
        int count = 0;
        for(String[] line : allInputLines){
            System.out.println("From line " + ++count + ":");
            for(String element : line){
                System.out.println("\t" + element);

            }
            System.out.println();
        }

        //Let's write your output CSV
        StringWriter output = new StringWriter();
        CsvWriterSettings writerSettings = new CsvWriterSettings();
        writerSettings.getFormat().setLineSeparator("\r\n");
        writerSettings.getFormat().setQuoteEscape('\\'); //it seems you are using backslash as quote escape
        writerSettings.getFormat().setCharToEscapeQuoteEscaping('\\'); //when your quote escape character is not the same as the quote character, you might need to escape the escape character as well
        writerSettings.setQuoteAllFields(true); //let's force quotes on all fields so whatever is parsing your input file has more  chance of doing it properly
        CsvWriter writer = new CsvWriter(output, writerSettings);

        for(String[] row : allInputLines){
            writer.writeRow(row);
        }
        writer.close();

        System.out.println("===============\nNicely formatted output\n===============");
        System.out.println(output.toString());

    }

}

此代码将生成以下输出(可能由数据导入工具读取):

Input line: something,"a quoted value "with unescaped quotes" can be parsed", something

Input line: "after the newline 
you will find " more stuff

===============
Parsed input values
===============
From line 1:
    something
    a quoted value "with unescaped quotes" can be parsed
    something

From line 2:
    after the newline 
you will find " more stuff


===============
Nicely formatted output
===============
"something","a quoted value \"with unescaped quotes\" can be parsed","something"

"after the newline 
 you will find \" more stuff"

披露:我是这个图书馆的作者。它是开源和免费的(Apache V2.0许可证)。

ColdFusion 10+示例:

  1. 将jar加载到Application.cfc

    this.javaSettings = { loadPaths: ["C:\path\to\univocity-parsers-1.5.6.jar" ]};
    
  2. 使用createObject:

    创建解析器类的实例
    filePath = "c:\path\to\yourFile.csv";
    settings = createObject("java", "com.univocity.parsers.csv.CsvParserSettings").init();
    settings.getFormat().setLineSeparator(chr(13)& chr(10));
    settings.getFormat().setQuoteEscape("\");
    settings.setParseUnescapedQuotes(true); // THIS IS IMPORTANT FOR YOU
    parser = createObject("java", "com.univocity.parsers.csv.CsvParser").init(settings);
    reader = createObject("java", "java.io.StringReader").init(fileRead(filePath));
    arrayOfLines = parser.parseAll(reader);
    
    // display results
    counter = 1;
    for (line in arrayOfLines) {
        writeOutput("<br>From line "& (counter++) & ":");
        for (element in line) {
           writeOutput("<br>"& element);
        }
    }