我实际上是python的新手,目前我正面临熊猫数据帧的问题...我有一个循环,在某些条件下输出我每个迭代中的人/文件夹的名称,我希望此输出名称为立即被发送到数据帧但我在数据帧中获得的只有1行具有最后一次迭代的输出,所有先前的迭代输出都被写入... 下面是我正在使用的代码 我希望你能理解我的问题并提供帮助
from scipy.spatial import distance
import csv
import dlib
import os
import numpy as np
import cv2
import pandas as pd
from skimage import io
import face_recognition
from PIL import Image
with open("Data/train.csv","r") as facefeatures2:
reader=csv.reader(facefeatures2)
featureslist2=[]
for row in reader:
if len(row) != 0:
featureslist2= featureslist2 +[row]
facefeatures2.close()
float_int2=[]
results=[]
for f2 in range(0,len(featureslist2)):
float_int2 = float_int2 +[[float(str) for str in subarray] for subarray in [featureslist2[f2]]]
csv2 = np.vstack(float_int2)
faces_folder_path = "Data/newcropped"
list = os.listdir(faces_folder_path) # dir is your directory path
number_files = len(list)
print (number_files)
writer = pd.ExcelWriter('pandas_name11.xlsx', engine='xlsxwriter')
for loop in range(0,number_files):
print("iteration ="+str(loop+1))
unknown_image = face_recognition.load_image_file(faces_folder_path + "/" + str(loop+1)+".jpg")
cv2.imshow("test",unknown_image)
cv2.waitKey(0)
#### --------------exception handling-----------####
try:
unknown_face_encoding = face_recognition.face_encodings(unknown_image)[0]
except IndexError:
print("--->image is not detectable")
pass
# ...........................#
results = face_recognition.compare_faces(csv2, unknown_face_encoding)
chunks=[results[x:x + 12] for x in range(0, len(results),12)] # splits "results" list into sublists of size 12
dirpath = "Data/eachperson"
fname = []
fname = [f for f in sorted(os.listdir(dirpath))]
counter = 0
index=0
for c in range (0,len(chunks)):
if 'True' in str(chunks[c]):
counter=counter+1
index=c
df = pd.DataFrame({'names': [fname[index]]})
df.to_excel(writer, sheet_name='Sheet1')
if counter !=1 or counter ==0 :
print("student is not present :(")
else:
print(str(fname[index])+" is present!!!")
writer.save()
答案 0 :(得分:1)
为什么不初始化数据框列表?继续追加到列表中,只有在最后,您应该将其合并到一个大数据帧中并写入它。 .to_excel
每次写入时都会覆盖excel文件,因此在循环中调用它不是一个好主意,除非你在追加模式下打开它。但同样,这是低效的。
尝试这样的事情:
df_list = []
for loop in range(0, number_files):
...
for c in range (0,len(chunks)):
if 'True' in str(chunks[c]):
...
df_list.append(pd.DataFrame({'names': [fname[index]]}))
writer = pd.ExcelWriter('pandas_name11.xlsx', engine='xlsxwriter')
pd.concat(df_list).reset_index(drop=True).to_excel(writer, sheet_name='Sheet1')
如果您想要重写每次迭代,您还可以考虑查看this。