Question

此函数中的循环是我项目代码中最慢的部分

该函数旨在从mySQL导入表-该表具有Image_DateTaken，Image_Path，Image_InAlbum和Image_DirectoryRelPath列。

它循环遍历数据帧中的行，从Image_DateTaken获取年份，并根据2015-16/Album是否大于1来组装2016-17或Image_InAlbum之类的路径这种情况下会附加Album部分。

然后将这个新路径写入到dataFrame的列中，并导出回sql。

import pandas as pd
from pathlib import Path
#album reads and writes dataframes to sql and back

def write_directoryPath_to_Images():
    "Imaginary Create database folder with all images in folders of Date, subfolders of album"

    #read data
    images_df = album.read_sql('images')
    results_df = pd.DataFrame(['Image_DirectoryPath'])
    images_df.Image_DateTaken = pd.to_datetime(images_df.Image_DateTaken, errors= 'coerce')

    #group by date (year)
    grouped = images_df.groupby(images_df.Image_DateTaken.dt.year)
    images_df.set_index('Image_Path',inplace=True)
    #loop through groups and match them to image paths
    print("Looping through {} groups".format(len(grouped)))
    for date, group in grouped:
        year = '{0:g}-{1}'.format(date, '{0:g}'.format(date+1)[-2:])
        for path in group.index:
            base = Path(year)

            relPath = base.joinpath('Album') if images_df.Image_InAlbum[path] > 0 else base

            images_df['Image_DirectoryRelPath'][path] = str(relPath)

    #clean up

    images_df.reset_index(inplace=True)
    album.to_sql(images_df,'images')

我觉得循环并不是真正的大熊猫方式，但我不确定如何解决。我想尽可能地加快整个功能，我知道

for path in group.index:
    base = Path(year)

    relPath = base.joinpath('Album') if images_df.Image_InAlbum[path] > 0 else base

    images_df['Image_DirectoryRelPath'][path] = str(relPath)

是最循环的部分，因此可以从速度提升中受益最多，但也许我的整个方法就是问题所在。

是什么导致在数据框中组装路径的瓶颈

0 个答案: