我正在尝试加载两个数据集,并将它们都用于训练。
软件包版本:python 3.7; pytorch 1.3.1
可以单独创建data_loaders并对其进行顺序训练:
from torch.utils.data import DataLoader, ConcatDataset
train_loader_modelnet = DataLoader(ModelNet(args.modelnet_root, categories=args.modelnet_categories,split='train', transform=transform_modelnet, device=args.device),batch_size=args.batch_size, shuffle=True)
train_loader_mydata = DataLoader(MyDataset(args.customdata_root, categories=args.mydata_categories, split='train', device=args.device),batch_size=args.batch_size, shuffle=True)
for e in range(args.epochs):
for idx, batch in enumerate(tqdm(train_loader_modelnet)):
# training on dataset1
for idx, batch in enumerate(tqdm(train_loader_custom)):
# training on dataset2
注意:MyDataset是自定义数据集类,已实现def __len__(self):
def __getitem__(self, index):
。通过以上配置,似乎可以实现了。
但我理想情况下希望将它们组合到单个数据加载器对象中。我尝试根据pytorch文档进行此操作:
train_modelnet = ModelNet(args.modelnet_root, categories=args.modelnet_categories,
split='train', transform=transform_modelnet, device=args.device)
train_mydata = CloudDataset(args.customdata_root, categories=args.mydata_categories,
split='train', device=args.device)
train_loader = torch.utils.data.ConcatDataset(train_modelnet, train_customdata)
for e in range(args.epochs):
for idx, batch in enumerate(tqdm(train_loader)):
# training on combined
但是,在随机批处理中,我得到以下错误类型“在参数0中将张量预期为元素X,但得到了元组”。任何帮助将不胜感激!
> 40%|████ | 53/131 [01:03<02:00, 1.55s/it]
> Traceback (mostrecent call last): File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/pydevd.py",
> line 1434, in _exec
> pydev_imports.execfile(file, globals, locals) # execute the script File
> "/home/chris/Programs/pycharm-anaconda-2019.3.4/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
> exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/chris/Documents/4yp/Data/my_kaolin/Classification/pointcloud_classification_combinedset.py",
> line 83, in <module>
> for idx, batch in enumerate(tqdm(train_loader)): File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/tqdm/std.py",
> line 1107, in __iter__
> for obj in iterable: File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
> line 346, in __next__
> data = self._dataset_fetcher.fetch(index) # may raise StopIteration File
> "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py",
> line 47, in fetch
> return self.collate_fn(data) File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in default_collate
> return [default_collate(samples) for samples in transposed] File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 79, in <listcomp>
> return [default_collate(samples) for samples in transposed] File "/home/chris/anaconda3/envs/4YP/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py",
> line 55, in default_collate
> return torch.stack(batch, 0, out=out) TypeError: expected Tensor as element 3 in argument 0, but got tuple
答案 0 :(得分:1)
如果我回答对了你的问题:你有如下的训练集和开发集(以及它们对应的加载器)。
train_set = CustomDataset(...)
train_loader = DataLoader(dataset=train_set, ...)
dev_set = CustomDataset(...)
dev_loader = DataLoader(dataset=dev_set, ...)
并且您想连接它们以使用 train+dev 作为训练数据,对吗?如果是这样,您只需调用:
train_dev_sets = torch.utils.data.ConcatDataset([train_set, dev_set])
train_dev_loader = DataLoader(dataset=train_dev_sets, ...)
train_dev_loader 是包含两组数据的加载器。
现在,请确保您的数据具有相同的形状和相同的类型,即相同数量的特征,或相同的类别/数量等。
答案 1 :(得分:0)
我猜这两个数据集有时会返回不同的类型。当数据为张量时,割炬将它们堆叠起来,它们最好是相同的形状。如果它们像弦之类的东西,火炬将使它们成一个元组。所以这听起来像您的一个数据集有时返回的不是张量。我会在数据集的输出中放一些断言,以检查它是否正在执行所需的操作,或者使用pdb
进行深入研究。
答案 2 :(得分:0)
除了@Leopd的答案,您还可以使用PyTorch提供的-x
function。这个想法是,在collate_fn
中,您将定义应如何堆叠示例以进行批处理。由于您使用的是割炬1.3.1,因此请确保您使用的是documentation的正确版本。
让我知道这是否有帮助,或者您有任何后续问题:)