我有一个100万张照片的文件夹。突然我的代码停止在第88,001个文件上运行。问题出在文件本身。我的问题是:如何从88002文件开始我的代码。
text1=[]
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
file_names=[]
for file in os.listdir('C:/BackUp/PhD/Data_from_Core_AP/Python/GeoTaggingWellsImages/filtered_images/Chittoor'):
if file.endswith(".jpeg"):
file_names.append(file)
print(file)
path = 'C:/BackUp/PhD/Data_from_Core_AP/Python/GeoTaggingWellsImages/filtered_images/Chittoor/'+file
img = cv2.imread(path)
crop_img = img[365:385,10:395]
gray = cv2.cvtColor(crop_img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(crop_img,245 ,255, cv2.THRESH_TRUNC)
cv2.imwrite("C:/BackUp/PhD/Data_from_Core_AP/Python/GeoTaggingWellsImages/filtered_images/temp.jpeg", gray)
text = pytesseract.image_to_string(Image.open("C:/BackUp/PhD/Data_from_Core_AP/Python/GeoTaggingWellsImages/filtered_images/temp.jpeg"), config='outputbase digits')
temp=[]
file = file.strip(".jpeg")
temp.append(file)
temp.append(text)
text1.append(temp)
f1=open("temp.txt",'a')
f1.write(str(temp).replace("[","").replace("]","").replace("'","")+'\n')
f1.close()
答案 0 :(得分:0)
由于os.listdir(path)
返回一个列表,因此您可以像这样跳过前88001个元素:
os.listdir(path)[88001:]
但是,请注意,这实际上跳过了88001个文件,而不是88801个 jpeg的。
答案 1 :(得分:0)
您可以按照以下方式使用itertools.islice()
:
from itertools import islice
...
start = 88002
files = os.listdir('C:/BackUp/PhD/Data_from_Core_AP/Python/GeoTaggingWellsImages/filtered_images/Chittoor')
for file in isclice(files, start-1, None):
if file.endswith(".jpeg"):
...