我正在创建一个脚本,通过检查文件中的已知关键字将视频文件分类到文件夹中。随着关键字数量的增长失控,脚本变得非常慢,每个文件需要花费几秒钟来处理。
@echo off
cd /d d:\videos\shorts
if /i not "%cd%"=="d:\videos\shorts" echo invalid shorts dir. && exit /b
:: auto detect folder name via anchor file
for /r %%i in (*spirit*science*chakras*) do set conspiracies=%%~dpi
if not exist "%conspiracies%" echo conscpiracies dir missing. && pause && exit /b
for /r %%i in (*modeselektor*evil*) do set musicvideos=%%~dpi
if not exist "%musicvideos%" echo musicvideos dir missing. && pause && exit /b
for %%s in (*) do set "file=%%~nxs" & set "full=%%s" & call :count
for %%v in (*) do echo can't sort "%%~nv"
exit /b
:count
set oldfile="%file%"
set newfile=%oldfile:&=and%
if not %oldfile%==%newfile% ren "%full%" %newfile%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"music" >nul && set words=%words%, music&& set /a count+=1
echo "%~n1" | findstr /i /c:"official video" >nul && set words=%words%, official video&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for music videos
set musicvideoscount=%count%
set count=0
set words= & rem
echo "%~n1" | findstr /i /c:"misinform" >nul && set words=%words%, misinform&& set /a count+=1
echo "%~n1" | findstr /i /c:"antikythera" >nul && set words=%words%, antikythera&& set /a count+=2
set words=%words:has, =has %
set words=%words: , =%
if not %count%==0 echo "%file%" has "%words%" %count%p for conspiracies
set conspiraciescount=%count%
set wanted=3
set winner=none
:loop
:: count points and set winner (in case of tie lowest in this list wins, sort accordingly)
if %conspiraciescount%==%wanted% set winner=%conspiracies%
if %musicvideoscount%==%wanted% set winner=%musicvideos%
set /a wanted+=1
if not %wanted%==15 goto loop
if not "%winner%"=="none" move "%full%" "%winner%" >nul && echo "%winner%%file%" && echo.
注意每个关键字的“权重值”。它计算每个类别的总点数,找到最大值并将文件移动到指定给该类别的文件夹。它还显示它找到的单词,最后列出它找不到的文件,这样我就可以添加关键字或调整权重值。
我已将此示例中的文件夹和关键字数量减少到最低限度。完整脚本有六个文件夹,大小为64k,包含所有关键字(并且还在增长)。
答案 0 :(得分:3)
@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories=music conspiracies"
REM SET "categories=conspiracies music"
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|%%s^|%%t
)
)
)>"%tempfile%"
SET "lastname="
FOR /f "tokens=1,2,*delims=|" %%a IN ('sort "%tempfile%"') DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner="
SET /a maxfound=0
FOR %%v IN (%categories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO CALL :compare %%w %%x
)
IF DEFINED winner ECHO %winner% %lastname:&=and%
:RESET
FOR %%v IN (%categories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
:compare
IF %2 lss %maxfound% GOTO :EOF
IF %2 gtr %maxfound% GOTO setwinner
:: equal scores use categories to determine
IF DEFINED winner GOTO :eof
:Setwinner
SET "winner=%1"
SET maxfound=%2
GOTO :eof
您需要更改sourcedir
的设置以适合您的具体情况。
我使用了一个名为q45196316.txt
的文件,其中包含我的测试类别数据。
music,6,music
music,8,Official video
conspiracies,3,misinform
conspiracies,6,antikythera
missing,0,not appearing in this directory
我认为您的问题是重复执行findstr
非常耗时。
此方法使用包含category,weight,mask
行的数据文件。 categories
变量包含按优先顺序排列的类别列表(当分数相等时)
读取数据文件,将类别分配给%%s
,将权重分配给%%t
并屏蔽到%%u
,然后使用掩码进行目录扫描。对于找到的每个匹配项,这将echo
以name|category|weight
格式到临时文件的一行。第一次扫描后,dir
似乎非常快。
因此,生成的临时文件将为每个文件名+类别加上权重添加一行,因此如果文件名适合多个类别,则将创建多个条目。
然后我们扫描该文件的已排序版本并解析得分。
首先,如果文件名发生变化,我们可以报告最后的文件名。这是通过比较变量$categoryname
中的值来完成的。由于这些是按%categories%
的顺序扫描的,因此如果存在等分的分数,则选择列表中的第一个类别。然后重置分数并将lastname
初始化为新文件名。
然后我们将分数累积到$categoryname
所以 - 我相信会更快一些。
修订
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "tempfile=%temp%\somename"
SET "categories="rock music" music conspiracies"
REM SET "categories=conspiracies music"
:: set up sorting categories
SET "sortingcategories="
FOR %%a IN (%categories%) DO SET "sortingcategories=!sortingcategories!,%%~a"
SET "sortingcategories=%sortingcategories: =_%"
:: Create "tempfile" containing lines of name|sortingcategory|weight
(
FOR /f "tokens=1,2,*delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*%%u*" 2^>nul'
) DO (
ECHO %%a^|!sortingcategory!^|%%t^|%%s^|%%u
)
)
)>"%tempfile%"
SET "lastname="
SORT "%tempfile%">"%tempfile%.s"
FOR /f "usebackqtokens=1,2,3delims=|" %%a IN ("%tempfile%.s") DO (
CALL :resolve %%b %%c "%%a"
)
:: and the last entry...
CALL :resolve dummy 0
GOTO :EOF
:: resolve by totalling weights (%2) in sortingcategories (%1)
:: for each name (%3)
:resolve
IF "%~3" equ "%lastname%" GOTO accum
:: report and reset accumulators
IF NOT DEFINED lastname GOTO RESET
SET "winner=none"
SET /a maxfound=0
FOR %%v IN (%sortingcategories%) DO (
FOR /f "tokens=1,2delims=$=" %%w IN ('set $%%v') DO IF %%x gtr !maxfound! (SET "winner=%%v"&SET /a maxfound=%%x)
)
ECHO %winner:_= % %lastname:&=and%
:RESET
FOR %%v IN (%sortingcategories%) DO SET /a $%%v=0
SET "lastname=%~3"
:accum
SET /a $%1+=%2
GOTO :eof
我添加了一些重要的评论。
现在,您可以在类别名称中包含空格 - 您需要在set catagories...
语句中引用名称(用于报告目的)。
sortingcategories
是自动派生的 - 它仅用于排序,只是名称中的任何空格被下划线替换的类别。
在创建临时文件时,类别将被处理为包含下划线(sortingcategory),当解析最终展示位置时,将删除下划线,返回类别名称。
现在应该适当地处理负权重。
- “不追加*”的进一步修订
FOR /f "tokens=1-5delims=," %%s IN (q45196316.txt) DO (
SET "sortingcategory=%%s"
SET "sortingcategory=!sortingcategory: =_!"
FOR %%z IN ("!sortingcategory!") DO (
SETLOCAL disabledelayedexpansion
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\%%~v%%u%%~w" 2^>nul'
和
在q45196316文件中添加2个额外的列
music,6,music,*,*
music,8,Official video,"",*
conspiracies,3,misinform,*,*
conspiracies,6,kythera,*anti,*
missing,0,not appearing in this directory,*,*
rock music,2,metal,*,*
conspiracies,-5,negative,*,*
for /f ... %%s
现在生成%%v
和%%w
,其中包含最后两列(tokens
也是1-5
)
这些作为前缀和后缀应用于%%u
命令中的dir
。请注意,""
应该用于 nothing ,因为两个连续的,
被解析为单个分隔符。 ~
中的v
/ w
前%%~v
表示remove the quotes
。