Question

我有多个数据集，每个数据集中有不同数量的图像（和不同的图像尺寸）。在训练循环中，我想从所有数据集中随机加载一批图像，但是每一批仅包含来自单个数据集的图像。例如，我有数据集A，B，C，D，每个数据集都有图像01.jpg，02.jpg，…n.jpg（其中n取决于数据集），并且说批量大小为3。例如，我可能会在下一批[D / 01.jpg，D / 05.jpg，D / 12]中获取图像[B / 02.jpg，B / 06.jpg，B / 12.jpg] .jpg]等

到目前为止，我已经考虑了以下问题：

为每个数据集使用不同的DataLoader，例如数据加载器A，数据加载器B等，然后在每个训练循环中随机选择一个数据加载器并从中获取一批。但是，这将需要一个for循环，并且对于大量数据集而言，这将非常慢，因为它无法在工作人员之间拆分以并行执行。
将单个DataLoader与所有数据集中的所有图像一起使用，但要使用自定义collate_fn，它将仅使用同一数据集中的图像来创建批处理。（我不确定该怎么做。）
我看过ConcatDataset类，但是从它的源代码来看，如果我使用它并尝试获取一个新批处理，则该类中的图像将在不需要的不同数据集中混合在一起。

什么是最好的方法？谢谢！

Answer 1

您可以使用ConcatDataset，并向DataLoader提供一个Ext.onReady(function(){ Ext.QuickTips.init(); Ext.FocusManager.enable(); // Stores var baseStore = Ext.create('Ext.data.TreeStore', { root: { expanded: true, leaf: false, children: [] } }); var sportsStore = Ext.create('Ext.data.TreeStore', { root: { expanded: true, id: 133, children: [ { text: "Audi", id: 1, leaf: true }, { text: "sports cars", expanded: true, id: 2, children: [{ id: 3, text: "Porsche", leaf: true }, { text: "Mustang", id: 4, leaf: true } ] }, { text: "Jaguar", id: 5, leaf: true } ] } }); var carStore = Ext.create('Ext.data.TreeStore', { root: { expanded: true, id: 1444, children: [ { id: 6, text: "Toyota", leaf: true }, { text: "cars", id: 7, expanded: true, children: [ { id: 8, text: "honda", leaf: true }, { text: "Nissan", id: 9, leaf: true } ] }, { text: "Kia", id: 10, leaf: true } ] } }); // Filling data function fillStore(xparent, xnode) { for (var i = 0; i < xnode.childNodes.length; i++) { var current = xnode.childNodes[i]; var added = xparent.appendChild( { text: current.data.text, leaf: current.data.leaf, id: current.data.id } ); if (current.data.leaf === false) { fillStore(added, current); } } } function setStore(store) { var root = baseStore.getRootNode(); if (root.hasChildNodes()) { root.removeAll(); } fillStore(root, store.getRootNode()); } // First fill setStore(carStore); Ext.create('Ext.panel.Panel', { title: 'Car Simple Tree', width: 300, height: 450, renderTo: Ext.getBody(), items: [ { xtype: 'button', text: 'sports', handler: function() { alert('You clicked the sports button!'); var t = Ext.getCmp('tp'); setStore(sportsStore); } }, { xtype: 'button', text: 'car', handler: function() { alert('You clicked the car button!'); var t = Ext.getCmp('tp'); setStore(carStore); } }, { xtype: 'treepanel', id: 'tp', store: baseStore, rootVisible: false, lines: true } ] }); });。

batch_sampler

concat_dataset = ConcatDataset((dataset1, dataset2))将为您提供每个数据集之间的界限：

ConcatDataset.comulative_sizes

现在，您可以使用ds_indices = concat_dataset.cumulative_sizes创建一个批处理采样器。请参阅the source for BatchSampler以供参考。您的批次采样器只需要返回一个带有N个随机索引的列表，这些列表将遵守ds_indices边界。这将确保您的批次将具有来自同一数据集的元素。

从多个不同大小的数据集中加载PyTorch数据

1 个答案: