Question

我正在编写自定义数据集，但是当我将根目录path与csv文件中图像名称的熊猫iloc合并时返回类型错误：

 img_path = os.path.join(self.root_dir, self.annotations.iloc[index,0])

错误：TypeError: join() argument must be str or bytes, not 'int64'

我尝试将注释.iloc转换为字符串类型，但仍然给我同样的错误。

csvfile with filenames and labels:

自定义数据集类：

class patientdataset(Dataset):

    def __init__(self, csv_file, root_dir, transform=None):  
            self.annotations = pd.read_csv(csv_file)
            self.root_dir = root_dir
            self.transform = transform



    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index,0])  
        image= np.array(np.load(img_path)) 
        y_label = torch.tensor(self.annotations.iloc[index, 1]).long()


        if self.transform:
            imagearrays = self.transform(image)
            image = imagearrays[None, :, :, :]
            imaget = np.transpose(image, (0, 2, 1, 3))
            image = imaget


        return (image, y_label)

Answer 1

根据您的数据集（附加的csv文件），pd.read_csv(csv_file)生成具有3列的数据框：第1列用于索引，第2列用于文件名，第3列用于标签。这条img_path = os.path.join(self.root_dir, self.annotations.iloc[index,0])行不起作用，因为iloc[index, 0]大约是第一列，它将提取索引数据而不是文件名，并且join希望得到2个字符串，这就是为什么得到TypeError。

根据您的csv文件示例，您应该这样做：

class patientdataset(Dataset):

    def __init__(self, csv_file, root_dir, transform=None):  
            self.annotations = pd.read_csv(csv_file)
            self.root_dir = root_dir
            self.transform = transform



    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])  # 1 - for file name (2nd column)  
        image= np.array(np.load(img_path)) 
        y_label = torch.tensor(self.annotations.iloc[index, 2]).long()  # 2 - for label (3rd column)


        if self.transform:
            imagearrays = self.transform(image)
            image = imagearrays[None, :, :, :]
            imaget = np.transpose(image, (0, 2, 1, 3))
            image = imaget


        return (image, y_label)

自定义数据集os.path.join（）返回类型错误

1 个答案: