如何提取在文本文件中移动的文本行

时间:2014-04-02 20:05:50

标签: batch-file

我正在尝试从每天更改的网页中删除一个链接我已经设法到达了一条链接所在的文本行,但我无法得到它,因为它被保留的字符所包围弄乱我的尝试,我不是很擅长它,并且基本上是在屠杀其他脚本以试图解决它,现在在12小时我承认失败并需要帮助  我所处的观点是在文本文件中有以下内容

    </a><p>Support the free distribution of this forecast by visiting our sponsors website.<p><b>Select forecast - </b><a href="?fdate=140403">Tomorrow</a> / <a href="?fdate=140404">Friday</a> / <a href="?fdate=140405">Saturday</a><p><hr><h5>Viewing forecast for Thursday, 3rd April, 2014</h5><p>Forecast last reviewed on Wednesday, 02/04/14 at 16:17<p><a href="jnzolmdtgobavkjz/EH.PDF" target="blank" border="0"><img src="images/pdf.gif" align="left"></a><br><a href="jnzolmdtgobavkjz/EH.PDF" target="blank">Click here to access the PDF version of the forecast</a>.<br><br><br><br><hr><h5>Summary for all mountain areas</h5><p>Low cloud will remain widespread across eastern mountains south to about the central or southern Pennines. Higher summits may well be above the cloud. Outbreaks of rain will move north, locally heavy. Local gusty winds.<p><hr><h5>Headline, Cairngorms National Park, Monadhliath</h5><p>Outbreaks of rain; hazy. Locally gusty wind.<p><p><hr><h5>How Windy?</h5><p>East or southeasterly, 20 to occasionally 30 or 35mph.<p><h5>Effect Of Wind?</h5><p>Will impede ease of walking on some areas, not necessarily the highest areas. Sudden gusts west of major ridges and some passes and cols.<p><hr><h5>How Wet?</h5><p>Bursts of rain<p>Rain now and again, ranging from brief light showers to heavier bursts lasting an hour or two - these most likely west of the A9.<p><hr><h5>Cloud on the hills?</h5><p>Widespread east<p>Most, perhaps all higher areas intermittently cloud free. But very low cloud over North Sea will shroud areas accessible from Deeside from lower slopes, although higher tops (above about 900m) often above the cloud.<p><h5>Chance of cloud free Munros?</h5><p>80%<p><h5>Sunshine and air clarity?</h5><p>Patchy weak sunshine. Very hazy low level, but excellent visibility many higher slopes. Extensive fog eastern mountains, particularly lower slopes.<p><hr><h5>How Cold? (at 900m)</h5><p>4 to 6C, but 2C where in cloud.<p><h5>Freezing level</h5><p>Above the summits<p><hr><h5>Planning Outlook for all mountain areas from Friday, 4th April, 2014</h5><p>Winds will turn south to southwesterly into the weekend as rain bearing fronts come in off the Atlantic. Snowmelt in Scotland will continue. Winds at times approaching or reaching gale higher areas. 

我想要的输出是:jnzolmdtgobavkjz / EH.PDF

我已经设法解决了我的其余问题,但随着地址的变化,每天我都没有设法解决问题

如果它可以在BAT中启动或从BAT开始,那将是很好的或输出到文本文件,所以我可以继续在BAT中处理它。

希望有人可以提供帮助

干杯

萨姆

4 个答案:

答案 0 :(得分:0)

试试这个:

文本文件的名称:input.txt 输出放在:output.txt

@ECHO Off
setlocal enabledelayedexpansion

for /f  "delims=" %%a in ('type input.txt') do (
set ligne=%%a
set ligne=!ligne:^<p^>^<a href^== #1# !
set ligne=!ligne: target= #2# !
)

set sw1=0

for %%b in (!ligne!') do (
if "%%b"=="#2#" goto:end
if !sw1!==1 echo %%b>Output.txt
if "%%b"=="#1#" set sw1=1
)
goto:eof

:end
type output.txt
endlocal
pause

答案 1 :(得分:0)

正如我在评论中所说,你指定了查找目标行/字符串的规则,所以我们只能猜测...下面的批处理文件找到同时包含两行的第一行&#34; HREF&#34;和&#34;目标&#34;字符串并提取括在引号中的第二个标记:

@echo off

for /F ^tokens^=2^delims^=^" %%a in ('findstr "href" input.txt ^| findstr "target"') do echo %%a& goto continue
:continue

输出:

jnzolmdtgobavkjz/EH.PDF

如果此方法无法解决您的问题,请指定规则...

答案 2 :(得分:0)

您在问题中粘贴的单行可以通过REPL.BAT

解析如下

它的作用是在双引号\.之前查找4个字符的句点\q,它会在此之前为双引号提供文本,直到匹配的双引号。它返回"jnzolmdtgobavkjz/EH.PDF",如果你想要它没有双引号,那么也可以这样做。

repl:这使用名为repl.bat的帮助程序批处理文件 - 从https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat下载

repl.bat放在与批处理文件相同的文件夹中或放在路径上的文件夹中。

type "file.txt"| repl ".*(\q.*\....\q).*" "$1" x >newfile.txt

答案 3 :(得分:0)

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION 
FOR /f "delims=" %%a IN (q22821879.txt) DO SET "line=%%a"
:loop
SET "line=!line:*<p>=!"
IF NOT "%line:~0,8%"=="<a href=" GOTO loop
FOR /f "delims=<>" %%a IN ("%line%") DO FOR %%z IN (%%a) DO IF NOT "%%z"=="%%~z" SET "line=%%~z"&GOTO done
:done
ECHO %line%
GOTO :EOF

我使用了一个名为q22821879.txt的文件,其中包含我的测试数据。