YoloV4 自定义数据集训练测试拆分

时间:2021-03-11 09:03:19

标签: python scikit-learn yolo train-test-split custom-dataset

我尝试使用自定义数据集训练 Yolo Net。我有一些图像 (*.jpg) 和 yolo 格式的标签/注释作为 txt 文件。

现在我想将数据拆分为训练集和验证集。因此,我想要一个火车和一个验证文件夹,每个文件夹都有自己的图像和注释。

我尝试过这样的事情:

from sklearn.model_selection import train_test_split
import glob


# Get all paths to your images files and text files
PATH = '../TrainingsData/'
img_paths = glob.glob(PATH+'*.jpg')
txt_paths = glob.glob(PATH+'*.txt')
    
X_train, X_test, y_train, y_test = train_test_split(img_paths, txt_paths, test_size=0.3, random_state=42)

将设置保存到新文件夹后,图像和注释混淆了。因此,例如在 train 文件夹中,一些图像没有注释(它们在验证文件夹中),并且有一些注释但图像丢失了。

你能帮我拆分我的数据集吗?

2 个答案:

答案 0 :(得分:2)

好的!!,你可以这样做

分割图片功能

def split_img_label(data_train,data_test,folder_train,foler_test):
    
    os.mkdir(folder_train)
    os.mkdir(folder_test)
    
    
    train_ind=list(data_train.index)
    test_ind=list(data_test.index)
    
    
    # Train folder
    for i in tqdm(range(len(train_ind))):
        
        os.system('cp '+data_train[train_ind[i]]+' ./'+ folder_train + '/'  +data_train[train_ind[i]].split('/')[2])
        os.system('cp '+data_train[train_ind[i]].split('.jpg')[0]+'.txt'+'  ./'+ folder_train + '/'  +data_train[train_ind[i]].split('/')[2].split('.jpg')[0]+'.txt')
    
    # Test folder
    for j in tqdm(range(len(test_ind))):
        
        os.system('cp '+data_test[test_ind[j]]+' ./'+ folder_test + '/'  +data_test[test_ind[j]].split('/')[2])
        os.system('cp '+data_test[test_ind[j]].split('.jpg')[0]+'.txt'+'  ./'+ folder_test + '/'  +data_test[test_ind[j]].split('/')[2].split('.jpg')[0]+'.txt')

代码


import pandas as pd 
import os 

PATH = './TrainingsData/'
list_img=[img for img in os.listdir(PATH) if img.endswith('.jpg')==True]
list_txt=[img for img in os.listdir(PATH) if img.endswith('.txt')==True]

path_img=[]

for i in range (len(list_img)):
    path_img.append(PATH+list_img[i])
    
df=pd.DataFrame(path_img)

# split 
data_train, data_test, labels_train, labels_test = train_test_split(df[0], df.index, test_size=0.20, random_state=42)

# Function split 
split_img_label(data_train,data_test,folder_train_name,folder_test_name)

输出

len(list_img)
583

100%|████████████████████████████████████████████████████████████████████████████████| 466/466 [00:26<00:00, 17.42it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 117/117 [00:07<00:00, 16.61it/s]

最后,您将拥有 2 个具有相同图像和标签的文件夹(folder_train_name,folder_test_name)。

答案 1 :(得分:0)

如果您想拆分图像和标签以训练自定义模型,我建议您执行以下步骤:

  1. 创建一个包含图像和标签的 obj 文件夹。
  2. 创建并运行generate_train.py脚本
#generate_train.py
import os

image_files = []
os.chdir(os.path.join("data", "obj"))
for filename in os.listdir(os.getcwd()):
    if filename.endswith(".jpg"):
        image_files.append("data/obj/" + filename)
os.chdir("..")
with open("train.txt", "w") as outfile:
    for image in image_files:
        outfile.write(image)
        outfile.write("\n")
    outfile.close()
os.chdir("..")
  1. 最后,当您拥有 train.txt 文件时,您可以运行以下代码:
df=pd.read_csv('PATH/data/train.txt',header=None)


# sklearn split 80 train, 20 test

data_train, data_test, labels_train, labels_test = train_test_split(df[0], df.index, test_size=0.20, random_state=42)

# train.txt contain the PATH of images and label to train 
data_train=data_train.reset_index()
data_train=data_train.drop(columns='index')
with open("train.txt", "w") as outfile:
    for ruta in data_train[0]:
        outfile.write(ruta)
        outfile.write("\n")
    outfile.close()

# test.txt contain the PATH of images and label to test 
data_test=data_test.reset_index()
data_test=data_test.drop(columns='index')
with open("test.txt", "w") as outfile:
    for ruta in data_test[0]:
        outfile.write(ruta)
        outfile.write("\n")
    outfile.close()

现在,您已准备好训练您的模型

YOLO

!./darknet detector train data/obj.data cfg/yolov4-FENO.cfg yolov4.conv.137 -dont_show -map

微小

!./darknet detector train data/obj.data cfg/yolov4_tiny.cfg yolov4-tiny.conv.29 -dont_show -map