Question

我在来自两个不同数据源的两个excel表格中有一个具有相应ID的原料药列表。

世界卫生组织的数据：

来自另一个来源的数据看起来相似，但是具有不同的产品ID和其他产品。但是，全面赋予物质ID的方式是标准的。

我必须阅读这两张纸，并比较某些产品是否根据其物质匹配。如果这样做，我必须将两个工作表中的相应产品ID相互映射。所以我的最终工作表将如下所示：

产品ID1产品ID2物质1物质2物质3物质4 .....

注意-一种产品可能包含100多种物质。

这是我尝试解决此问题的方式，但需要帮助：



从两个表中读取数据并放入地图



比较两个地图（这是我遇到的麻烦）



写入映射到excel文件的数据。

public static void main(String[] args) {

    String readFile = "C:\\Users\\admin\\Desktop\\SampleData";
    HashMap<Double, Set<Object>> productMapWHO = new HashMap<Double, Set<Object>>();
    HashMap<Double, Set<Object>> productMapNDC = new HashMap<Double, Set<Object>>();
    productMapWHO = readExcel(0, readFile);
    productMapNDC = readExcel(1, readFile);

    Map<Double,Map<Double,Set<Object>>> WHOtoNDCMapping = new HashMap<Double,Map<Double,Set<Object>>>();

    WHOtoNDCMapping = compareProductMaps(productMapWHO,productMapNDC);

    String writeFile = "C:\\Users\\admin\\Desktop\\WHO_NDC_Mapping.xls";

    try {
        writeToExcel(WHOtoNDCMapping,writeFile);
    } catch (InvalidFormatException e) {
        e.printStackTrace();
    } catch (HPSFException e) {
        e.printStackTrace();
    }

}


private static HashMap<Double, Set<Object>> readExcel(int sheetNumber, String fileName) {


    HashMap<Double, Set<Object>> productMap = new HashMap<Double, Set<Object>>();

    try {
        FileInputStream file = new FileInputStream(new File(fileName));

        //Create Workbook instance holding reference to .xlsx file
        XSSFWorkbook workbook = new XSSFWorkbook(file);
        //Get first/desired sheet from the workbook
        XSSFSheet sheet = workbook.getSheetAt(sheetNumber);
        //Iterate through each rows one by one
        Iterator<Row> rowIterator = sheet.iterator();

        while (rowIterator.hasNext()) {

            List<String> substancelist = new ArrayList<String>();

            Row row = rowIterator.next();

            double key;
            Object value="";
            //substancelist.clear();
            Iterator<Cell> cellIterator = row.cellIterator();
            Cell cell =null;
            while (cellIterator.hasNext()) {

                if(cell.getColumnIndex() == 1)
                    key = cell.getNumericCellValue();

                switch (cell.getCellType())
                {
                case Cell.CELL_TYPE_NUMERIC:
                    value = cell.getNumericCellValue();
                    break;

                case Cell.CELL_TYPE_STRING:
                    value = cell.getStringCellValue().trim();
                    break;

                }

                Set<Object> list = productMap.get(key);
                if (list == null) productMap.put(key, list = new HashSet<Object>());
                list.add(value);
            }
        }
    }

    catch (Exception e) {
        e.printStackTrace();
    }
    return productMap;
}


private static Map<Double,Map<Double,Set<Object>>> compareProductMaps (HashMap<Double, Set<Object>>productMap1, HashMap<Double, Set<Object>>productMap2) {

    Map<Double,Map<Double,Set<Object>>> finalMapping = new HashMap<Double,Map<Double,Set<Object>>>();


    for(Map.Entry<Double, Set<Object>> entry : productMap1.entrySet()) {
        Double key = entry.getKey();
        Map<Double,Set<Object>> mappedIds = new HashMap<Double, Set<Object>>();
        for(Set<Object> valueList : productMap1.values()) {
            if (valueList.size() == productMap2.values().size() && productMap2.values().containsAll(valueList))
            {
                Double productId2 = productMap2.get(valueList); //throws error here. I want to get the key for the corresponding valuelist that matched.
                mappedIds.put(productId2,valueList);
                finalMapping.put(key,mappedIds);
            }
        }
    }


    return finalMapping;

}

private static void writeToExcel(Map<Double,Map<Double,Set<Object>>> finalMapping, String xlsFilename) throws HPSFException, InvalidFormatException {


    Workbook wb = null;

    try {
        wb = WorkbookFactory.create(new FileInputStream(xlsFilename));
    } catch (EncryptedDocumentException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    Sheet sheet = wb.createSheet("WHOtoNDCMapping");

    int rowIdx = 0;
    int cellIdx = 0;

    // Header
    Row hssfHeader = sheet.createRow(rowIdx);


    rowIdx = 1;
    Row row = sheet.createRow(rowIdx++);
    cellIdx = 0;

    for(Double productId1 : finalMapping.keySet()) {
        Map<Double,Set<Object>> m1 = finalMapping.get(productId1);
        Cell cell = row.createCell(cellIdx++);
        cell.setCellValue(productId1);

        for(Double productId2 : m1.keySet()) {
            Set<Object> substanceList = m1.get(productId2);
            cell = row.createCell(cellIdx++);
            cell.setCellValue(productId2);

            for (Object substance : substanceList){
                if (substance instanceof String) {
                    cell.setCellValue((String) substance);
                } else if (substance instanceof Number) {
                    cell.setCellValue(((Number) substance).doubleValue());
                } else {
                    throw new RuntimeException("Cell value of invalid type " + substance);
                }
            }
        }
    }
    try {
        FileOutputStream out = new FileOutputStream(xlsFilename);
        wb.write(out);
        out.close();
    } catch (IOException e) {
        throw new HPSFException(e.getMessage());
    }
}

Answer 1

我不会使用地图等的地图，而是建立一个可以正确表示产品的类。如果两个文件的结构相同，则可以使用类似的方法（简化后，我将为您做一些工作；）））：

class ExcelProduct {
  String productId;
  String productName;
  Set<String> substanceIds; //assuming order is not relevant, otherwise use a list
}

然后，您可以将其读入Map<String, ExcelProduct>，其中密钥是产品ID，最后可以在这些地图上使用，例如通过遍历其中一张地图并从第二张地图中获取相应产品：

for( ExcelProduct leftProduct : leftMap.values() ) {
  ExcelProduct rightProduct = rightMap.get(leftProduct.productId);

  //product not present in right map so skip
  if( rightProduct == null ) {
    continue;
  }

  //compare products here, e.g. comparing the substance ids
  if( leftProduct.substanceIds.equals( rightProduct.substanceIds) ) {
    //do whatever is needed, e.g. add the product to the result list which will be written to the result excel file
    //you probably don't need a result map here
  }
}

请注意，leftProduct.substanceIds.equals( rightProduct.substanceIds)是否正常工作取决于您使用的是什么集实现，但是内置的实现应使用AbstractSet.equals()，如果传递的对象也是一个集，则它们将比较大小以及其中一个集合是否包含另一个集合的所有元素（这意味着，如果所有元素都存在且大小相同，则因为集合不能包含重复项，所以不会缺少任何其他元素）。

比较两个Excel文件中的数据，并在第三个文件中写入对应的映射

1 个答案: