从pmg文件中提取像素并将其转换为熊猫数据框

时间:2019-11-02 08:34:24

标签: python-3.x pandas python-imaging-library

  

我有一个目录,该目录的子目录每个都有一堆PMG文件,我想从每个图像中提取像素并将它们放在熊猫数据框中。

from PIL import Image
import os
import pandas as pd
import numpy as np
dirs = [r"D:\MSIT\Machine Learning\IMG"+"\\s"+str(i) for i in range(1,41)]
pixels = list()
df  = pd.DataFrame(columns = ["f" + str(i) for i in range(1,10305)])
cols = list(df.columns)
for directory in dirs:
    for filename in os.listdir(directory):
        im = Image.open(directory + "\\" +filename)
        dims = (list(im.getdata()))
        df2 = pd.Series(dims)
        pixels.append(dims)
k = 1
for i in pixels:
    for j in i:
        df2 = pd.Series(j)
        df.append(df2, ignore_index = True)
        print(str(k) + "Done")
        k += 1
print(df.head())
df.to_csv('pixel_data.csv') 

1 个答案:

答案 0 :(得分:1)

我假设您希望PMG文件的像素值成为您的功能。您可以使用 df.loc 在DataFrame中使用索引并以逐行的方式添加数据。另外,使用numpy会使过程更快一些。

import pandas as pd
from PIL import Image
import os
import numpy as np

columns = [i for i in range(10304)] 
columns.append('Label')

df = pd.DataFrame(columns=columns)
rows = 0

for direc in os.listdir():
    if direc.startswith('s'):
        print('Adding ' + direc)
        print('--------------')

        for file in os.listdir('./' + direc):
            im = Image.open('./' + direc + '/' + file)
            x = np.array(im.getdata())
            x = x.tolist()
            x.append(int(direc.replace('s', '')))
            df.loc[rows] = x
            rows += 1  

df.to_csv('Dataset.csv')