我正在使用Mule Studio 3.4.0社区版。 关于如何解析使用File Endpoint传入的大型CSV文件,我遇到了很大的问题。场景是我有3个CSV文件,我会把文件的内容放到数据库中。 但是当我尝试加载一个巨大的文件(大约144MB)时,我得到了“OutOfMemory”异常。我认为将大型CSV划分/拆分为较小尺寸的CSV(我不知道这个解决方案是否最好)的解决方案o尝试找到一种处理CSV的方法而不会抛出异常。
<file:connector name="File" autoDelete="true" streaming="true" validateConnections="true" doc:name="File"/>
<flow name="CsvToFile" doc:name="CsvToFile">
<file:inbound-endpoint path="src/main/resources/inbox" moveToDirectory="src/main/resources/processed" responseTimeout="10000" doc:name="CSV" connector-ref="File">
<file:filename-wildcard-filter pattern="*.csv" caseSensitive="true"/>
</file:inbound-endpoint>
<component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property"/>
<choice doc:name="Choice">
<when expression="INVOCATION:nome_file=azienda" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/companies-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Azienda"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertAziende" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Azienda">
<jdbc-ee:query key="InsertAziende" value="INSERT INTO aw006_azienda VALUES (#[map-payload:AW006_ID], #[map-payload:AW006_ID_CLIENTE], #[map-payload:AW006_RAGIONE_SOCIALE])"/>
</jdbc-ee:outbound-endpoint>
</when>
<when expression="INVOCATION:nome_file=servizi" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/services-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Servizi"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertServizi" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Servizi">
<jdbc-ee:query key="InsertServizi" value="INSERT INTO ctrl_aemd_unb_servizi VALUES (#[map-payload:CTRL_ID_TIPO_OPERAZIONE], #[map-payload:CTRL_DESCRIZIONE], #[map-payload:CTRL_COD_SERVIZIO])"/>
</jdbc-ee:outbound-endpoint>
</when>
<when expression="INVOCATION:nome_file=richiesta" evaluator="header">
<jdbc-ee:csv-to-maps-transformer delimiter="," mappingFile="src/main/resources/requests-csv-format.xml" ignoreFirstRecord="true" doc:name="CSV2Richiesta"/>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="InsertRichieste" queryTimeout="-1" connector-ref="jdbcConnector" doc:name="Database Richiesta">
<jdbc-ee:query key="InsertRichieste" value="INSERT INTO ctrl_aemd_unb_richiesta VALUES (#[map-payload:CTRL_ID_CONTROLLER], #[map-payload:CTRL_NUM_RICH_VENDITORE], #[map-payload:CTRL_VENDITORE], #[map-payload:CTRL_CANALE_VENDITORE], #[map-payload:CTRL_CODICE_SERVIZIO], #[map-payload:CTRL_STATO_AVANZ_SERVIZIO], #[map-payload:CTRL_DATA_INSERIMENTO])"/>
</jdbc-ee:outbound-endpoint>
</when>
</choice>
</flow>
请,我不知道如何解决这个问题。 提前感谢您提供任何帮助
答案 0 :(得分:3)
正如SteveS所说,csv-to-maps-transformer
可能会在处理之前尝试将整个文件加载到内存中。您可以尝试做的是将csv文件拆分为较小的部分,然后将这些部分发送到VM
以进行单独处理。
首先,创建一个组件来实现第一步:
public class CSVReader implements Callable{
@Override
public Object onCall(MuleEventContext eventContext) throws Exception {
InputStream fileStream = (InputStream) eventContext.getMessage().getPayload();
DataInputStream ds = new DataInputStream(fileStream);
BufferedReader br = new BufferedReader(new InputStreamReader(ds));
MuleClient muleClient = eventContext.getMuleContext().getClient();
String line;
while ((line = br.readLine()) != null) {
muleClient.dispatch("vm://in", line, null);
}
fileStream.close();
return null;
}
}
然后,将主流分成两个
<file:connector name="File"
workDirectory="yourWorkDirPath" autoDelete="false" streaming="true"/>
<flow name="CsvToFile" doc:name="Split and dispatch">
<file:inbound-endpoint path="inboxPath"
moveToDirectory="processedPath" pollingFrequency="60000"
doc:name="CSV" connector-ref="File">
<file:filename-wildcard-filter pattern="*.csv"
caseSensitive="true" />
</file:inbound-endpoint>
<component class="it.aizoon.grpBuyer.AddMessageProperty" doc:name="Add Message Property" />
<component class="com.dgonza.CSVReader" doc:name="Split the file and dispatch every line to VM" />
</flow>
<flow name="storeInDatabase" doc:name="receive lines and store in database">
<vm:inbound-endpoint exchange-pattern="one-way"
path="in" doc:name="VM" />
<Choice>
.
.
Your JDBC Stuff
.
.
<Choice />
</flow>
维护您当前的file-connector
配置以启用流式传输。使用此解决方案,可以处理csv数据,而无需先将整个文件加载到内存中。
HTH
答案 1 :(得分:1)
我相信csv-to-maps-transformer会强制整个文件进入内存。既然你正在处理一个大文件,我个人倾向于编写一个Java类来处理它。 File端点将文件流传递给您的自定义转换器。然后,您可以建立JDBC连接并一次一行地选择信息,而无需加载整个文件。我已经使用OpenCSV为我解析了CSV。所以你的java类将包含如下内容:
protected Object doTransform(Object src, String enc) throws TransformerException {
try {
//Make a JDBC connection here
//Now read and parse the CSV
FileReader csvFileData = (FileReader) src;
BufferedReader br = new BufferedReader(csvFileData);
CSVReader reader = new CSVReader(br);
//Read the CSV file and add the row to the appropriate List(s)
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
//Push your data into the database through your JDBC connection
}
//Close connection.
}catch (Exception e){
}