方案

Question

方案

为简单起见，假设我有一个ItemReader，它返回25行。

前10行属于学生A
接下来的5个属于学生B
其余10人属于学生C

我希望通过 studentId 合理地将它们聚合在一起，并将它们展平为以最终为每位学生一行。

问题

如果我理解正确，将提交间隔设置为5将执行以下操作：

向处理器发送5行（将汇总它们或执行我告诉它的任何业务逻辑）。
处理后将写入5行。
然后它将在接下来的5行中再次执行此操作，依此类推。

如果这是真的，那么对于接下来的五个，我将不得不检查已经写好的那些，将它们聚合到我正在处理的那些并再次写入它们。

我个人不这样做。

在Spring Batch中处理这种情况的最佳做法是什么？

替代

有时候我觉得编写一个常规的Spring JDBC主程序要容易得多，然后我就可以完全控制自己想要做的事了。但是，我想利用作业存储库状态监视作业，重启，跳过，作业和步骤监听器的能力....

My Spring Batch Code

我的 module-context.xml

   <?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">

    <description>Example job to get you started. It provides a skeleton for a typical batch application.</description>

    <batch:job id="job1">
        <batch:step id="step1"  >           
            <batch:tasklet transaction-manager="transactionManager" start-limit="100" >             
                 <batch:chunk reader="attendanceItemReader"
                              processor="attendanceProcessor" 
                              writer="attendanceItemWriter" 
                              commit-interval="10" 
                 />

            </batch:tasklet>
        </batch:step>
    </batch:job> 

    <bean id="attendanceItemReader" class="org.springframework.batch.item.database.JdbcCursorItemReader"> 
        <property name="dataSource">
            <ref bean="sourceDataSource"/>
        </property> 
        <property name="sql"                                                    
                  value="select s.student_name ,s.student_id ,fas.attendance_days ,fas.attendance_value from K12INTEL_DW.ftbl_attendance_stumonabssum fas inner join k12intel_dw.dtbl_students s on fas.student_key = s.student_key inner join K12INTEL_DW.dtbl_schools ds on fas.school_key = ds.school_key inner join k12intel_dw.dtbl_school_dates dsd on fas.school_dates_key = dsd.school_dates_key where dsd.rolling_local_school_yr_number = 0 and ds.school_code = ? and s.student_activity_indicator = 'Active' and fas.LOCAL_GRADING_PERIOD = 'G1' and s.student_current_grade_level = 'Gr 9' order by s.student_id"/>
        <property name="preparedStatementSetter" ref="attendanceStatementSetter"/>           
        <property name="rowMapper" ref="attendanceRowMapper"/> 
    </bean> 

    <bean id="attendanceStatementSetter" class="edu.kdc.visioncards.preparedstatements.AttendanceStatementSetter"/>

    <bean id="attendanceRowMapper" class="edu.kdc.visioncards.rowmapper.AttendanceRowMapper"/>

    <bean id="attendanceProcessor" class="edu.kdc.visioncards.AttendanceProcessor" />  

    <bean id="attendanceItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter"> 
        <property name="resource" value="file:target/outputs/passthrough.txt"/> 
        <property name="lineAggregator"> 
            <bean class="org.springframework.batch.item.file.transform.PassThroughLineAggregator" /> 
        </property> 
    </bean> 

</beans>

我的读者支持课程。

PreparedStatementSetter

package edu.kdc.visioncards.preparedstatements;

import java.sql.PreparedStatement;
import java.sql.SQLException;

import org.springframework.jdbc.core.PreparedStatementSetter;

public class AttendanceStatementSetter implements PreparedStatementSetter {

    public void setValues(PreparedStatement ps) throws SQLException {

        ps.setInt(1, 7);

    }

}

和RowMapper

package edu.kdc.visioncards.rowmapper;

import java.sql.ResultSet;
import java.sql.SQLException;

import org.springframework.jdbc.core.RowMapper;

import edu.kdc.visioncards.dto.AttendanceDTO;

public class AttendanceRowMapper<T> implements RowMapper<AttendanceDTO> {

    public static final String STUDENT_NAME = "STUDENT_NAME";
    public static final String STUDENT_ID = "STUDENT_ID";
    public static final String ATTENDANCE_DAYS = "ATTENDANCE_DAYS";
    public static final String ATTENDANCE_VALUE = "ATTENDANCE_VALUE";

    public AttendanceDTO mapRow(ResultSet rs, int rowNum) throws SQLException {

        AttendanceDTO dto = new AttendanceDTO();
        dto.setStudentId(rs.getString(STUDENT_ID));
        dto.setStudentName(rs.getString(STUDENT_NAME));
        dto.setAttDays(rs.getInt(ATTENDANCE_DAYS));
        dto.setAttValue(rs.getInt(ATTENDANCE_VALUE));

        return dto;
    }
}

我的处理器

package edu.kdc.visioncards;

import java.util.HashMap;
import java.util.Map;

import org.springframework.batch.item.ItemProcessor;

import edu.kdc.visioncards.dto.AttendanceDTO;

public class AttendanceProcessor implements ItemProcessor<AttendanceDTO, Map<Integer, AttendanceDTO>> {

    private Map<Integer, AttendanceDTO> map = new HashMap<Integer, AttendanceDTO>();

    public Map<Integer, AttendanceDTO> process(AttendanceDTO dto) throws Exception {

        if(map.containsKey(new Integer(dto.getStudentId()))){

            AttendanceDTO attDto = (AttendanceDTO)map.get(new Integer(dto.getStudentId()));
            attDto.setAttDays(attDto.getAttDays() + dto.getAttDays());
            attDto.setAttValue(attDto.getAttValue() + dto.getAttValue());

        }else{
            map.put(new Integer(dto.getStudentId()), dto);
        }
        return map;
    }

}

我对上述代码的疑虑

在处理器中，我创建了一个HashMap，当我处理行时，我会检查我是否已经在Map中有那个Student，如果它不在那里我会添加它。如果它已经存在，我抓住它获取我感兴趣的值，并添加我当前处理的行。

之后，Spring Batch Framework根据我的配置写入文件

我的问题如下：

我不想让它去找作家。我想处理所有剩余的行。如何将我在内存中创建的Map保留为需要通过同一个处理器的下一组行？每次，通过AttendanceProcessor处理一行，初始化Map。我应该将Map初始化放在静态块中吗？

Answer 1

我总是遵循这种模式：

我使我的阅读器范围成为“步骤”，并且在@PostConstruct中我获取结果，并将它们放在Map
在处理器中，我将associatedCollection转换为可写列表，并发送可写清单
在ItemWriter中，我根据具体情况保留可写项目

Answer 2

在我的应用程序中，我创建了一个INSERT INTO `post metrics minutes` ( `date updated`, `impressions`, `reach`, `fan reach`, `viral reach`, `consumptions`, `consumers`, `engaged users`, `engaged fans`, `engagements`, `storytellers`, `negative feedbacks`, `video views`, `hours video view time`, `video length (sec)`, `video avg time watched`, `content quality`, `type alignment`, `neg.fdbck.rate unweighted`, `eng.rate unweighted`, `video completion rate unweighted`, `post id`) SELECT * FROM ( SELECT now(), 94997, 61475, 87611, 5382, 2677, 1818, 2052, 1890, 577, 540, 53, 21955, 0, 959.13, 0, 'weak', 'normal rates', 0.00086213908092721, 0.0093859292395283, 0, 123) AS tmp WHERE NOT EXISTS ( SELECT `post id`, `impressions` FROM `post metrics minutes` WHERE `post id` = 123 AND `impressions` = 94997)，它扩展了标准JdbcCursorItemReader并完全按照您的需要执行。在内部，它使用我的CollectingJdbcCursorItemReader：标准RowMapper的扩展，它将多个相关行映射到一个对象。

以下是ItemReader的代码，CollectingRowMapper接口的代码及其抽象实现，可在我的another answer中找到。

CollectingRowMapper

您可以像经典import java.sql.ResultSet; import java.sql.SQLException; import org.springframework.batch.item.ReaderNotOpenException; import org.springframework.batch.item.database.JdbcCursorItemReader; import org.springframework.jdbc.core.RowMapper; /** * A JdbcCursorItemReader that uses a {@link CollectingRowMapper}. * Like the superclass this reader is not thread-safe. * * @author Pino Navato **/ public class CollectingJdbcCursorItemReader<T> extends JdbcCursorItemReader<T> { private CollectingRowMapper<T> rowMapper; private boolean firstRead = true; /** * Accepts a {@link CollectingRowMapper} only. **/ @Override public void setRowMapper(RowMapper<T> rowMapper) { this.rowMapper = (CollectingRowMapper<T>)rowMapper; super.setRowMapper(rowMapper); } /** * Read next row and map it to item. **/ @Override protected T doRead() throws Exception { if (rs == null) { throw new ReaderNotOpenException("Reader must be open before it can be read."); } try { if (firstRead) { if (!rs.next()) { //Subsequent calls to next() will be executed by rowMapper return null; } firstRead = false; } else if (!rowMapper.hasNext()) { return null; } T item = readCursor(rs, getCurrentItemCount()); return item; } catch (SQLException se) { throw getExceptionTranslator().translate("Attempt to process next row failed", getSql(), se); } } @Override protected T readCursor(ResultSet rs, int currentRow) throws SQLException { T result = super.readCursor(rs, currentRow); setCurrentItemCount(rs.getRow()); return result; } }一样使用它：唯一的要求是您提供JdbcCursorItemReader而不是经典CollectingRowMapper。

Answer 3

因为您更改了问题我添加了新答案

如果订购学生然后不需要列表/地图，您可以在处理器上使用一个studentObject来保持“当前”并聚合在其上直到有一个新的（读取：id更改）< / p>

如果未订购学生，您将永远不知道特定学生何时“完成”，并且您必须将所有学生留在地图中，直到完整阅读序列结束才能写出

小心：

处理器需要知道读卡器何时用尽
很难让它与任何提交率和“id”概念一起使用如果你聚合某种程度相同的项目，处理器就无法知道当前处理的项目是否是最后一项
基本上，用例要么在读者级别完全解决，要么在作者级别解决（参见其他答案）

private SimpleItem currentItem;
private StepExecution stepExecution;

@Override
public SimpleItem process(SimpleItem newItem) throws Exception {
    SimpleItem returnItem = null;

    if (currentItem == null) {
        currentItem = new SimpleItem(newItem.getId(), newItem.getValue());
    } else if (currentItem.getId() == newItem.getId()) {
        // aggregate somehow
        String value = currentItem.getValue() + newItem.getValue();
        currentItem.setValue(value);
    } else {
        // "clone"/copy currentItem
        returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
        // replace currentItem
        currentItem = newItem;
    }

    // reader exhausted?
    if(stepExecution.getExecutionContext().containsKey("readerExhausted")
            && (Boolean)stepExecution.getExecutionContext().get("readerExhausted")
            && currentItem.getId() == stepExecution.getExecutionContext().getInt("lastItemId")) {
        returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
    }

    return returnItem;
}

Answer 4

基本上你会谈到改变ID的批处理（1），其中批处理必须跟踪变化

对于春季/春季批次我们谈论：

ItemWriter，用于检查ID更改的项目列表
在更改之前，项目存储在临时数据存储区（2）（列表，地图等），并且不是
当ID更改时，聚合/展平业务代码将在数据存储区中的项目上运行，并且应写入一个项目，现在数据存储区可用于下一个具有下一个ID的项目
这个概念需要一个读者告诉步骤“我已经筋疲力尽”来正确刷新项目末尾的临时数据存储（文件/数据库）

这是一个粗略而简单的代码示例

@Override
public void write(List<? extends SimpleItem> items) throws Exception {

    // setup with first sharedId at startup
    if (currentId == null){
        currentId = items.get(0).getSharedId();
    }

    // check for change of sharedId in input
    // keep items in temporary dataStore until id change of input
    // call delegate if there is an id change or if the reader is exhausted
    for (SimpleItem item : items) {
        // already known sharedId, add to tempData
        if (item.getSharedId() == currentId) {
            tempData.add(item);
        } else {
            // or new sharedId, write tempData, empty it, keep new id
            // the delegate does the flattening/aggregating
            delegate.write(tempData);
            tempData.clear();
            currentId = item.getSharedId();
            tempData.add(item);
        }
    }

    // check if reader is exhausted, flush tempData
    if ((Boolean) stepExecution.getExecutionContext().get("readerExhausted")
            && tempData.size() > 0) {
        delegate.write(tempData);
        // optional delegate.clear(); 
    }
}

（1）假设项目按ID排序（也可以是复合的）

（2）用于线程安全的hashmap spring bean

Answer 5

使用Step Execution Listener并将记录作为map存储到StepExecutionContext，然后您可以将它们分组到writer或writer侦听器中并一次写入

如何在Spring Batch中的ItemReader之后处理逻辑上相关的行？

方案

问题

替代

My Spring Batch Code

我对上述代码的疑虑

5 个答案: