您好我试图在文件+附加文字的文本中匹配文件的部分名称。
基本上我有这样的文件:
PieceIwanttomatch_don't_care_about_this.txt
我正在尝试匹配,首先说文件名的七个字母加上文件中的一个字符串,我没有运气。
这是我到目前为止所拥有的:
use strict;
use warnings;
use File::Path qw(make_path remove_tree);
my $calls_dir = "Ask/Parsed/Html/";
opendir(my $search_dir, $calls_dir) or die "$!\n";
my @files = grep /\.txt$/i, readdir $search_dir;
closedir $search_dir;
#print "Got ", scalar @files, " files\n";
#my %seen = ();
for my $file (@files) {
my %seen = ();
my $current_file = $calls_dir . $file;
open my $FILE, '<', $current_file or die "$file: $!\n";
while (<$FILE>) {
#if (/phone/i) {
chomp;
#if (/phone\s*(.*)\r?$/i) {
#if (/^phone\s*:\s*(.*)\r?$/i) {
#if (/Contact\s*(.*)\r?$/i) {
#if (/^*(.*)team\s*(.*)\r?$/i) {
print substr(${file}, 0, 7);
if (/^(?=.* 'substr(${file}, 0, 7)')(?=.*management)/s) {
$seen{$1} = 1;
#print $file."\t"."$_\n";
#open my $fh, '>', "Ask/Parsed/Html2/"."${file}.parsed_for_contact_us.txt" or die $!;
make_path('Ask/Parsed/Html2/');
open my $fh, '>', "Ask/Parsed/Html2/" . "${file}.parsed_for_management.txt" or die $!;
#open my $fh, '>', "$_"."result".".txt" or die $!;
#$fh->print("$file\t$_\n");
$fh->print("$_\n");
print "$_\n";
#print "\t";
print "\n";
print "\t";
#print "$_\n";
#print "\t";
#print "\n";
foreach my $addr (sort keys %seen) {
}
}
}
close $FILE;
}
这是人们看到的另一个例子:
nintendo_ask_parse.html
。我尝试使用文件名中的字符串nintendo
和另一个字符串(比如game
)来查找文件中的一行并将其打印到另一个文件。于2014年12月12日增加 根据迄今为止一直在帮助我的一些人的要求,我们提供了更多数据。我正在运行我写的第一个脚本,用于将URL拉入文件。这是脚本:
use strict;
use warnings;
use LWP::Simple;
my $link1 = "http://www.ask.com/web?q=";
my $link2 = "+video+game&qsrc=0&o=0&l=dir&qo=homepageSearchBox";
#my $link3 = "http://www.";
#my $link4 = "http://www.manta.com/search? search_source=nav&pt=&search_location=Burlingame+CA&search=";
open (my $fh2, "untitled.txt")
or die "Could not open file";
while (my $row = <$fh2>) {
chomp $row;
print "$row\n";
my $xml1 = $link1 . $row. $link2 ;
#my $xmla = $link3 . $row . ".com";
#my $xmlx = $link4 . $row;
mkdir 'Ask', 0755;
my $filename1 = "Ask/".($row)."_"."ask".".html";
open my $fh1, ">", $filename1 or die("Could not open file. $!");
print $row;
my $xml2 = get $xml1;
print $xml1;
print "\n";
print $fh1 $xml2;
}
=============================================== ============================== 运行此脚本后,我会根据untitled.txt文件中的条目数获取html文件,每个条目1个。
Answers
Q&A Community
Advanced Search
Everything
Images
News
First Video Game Invented
Video Game Design
Wii
Video Game Designer Career
Video Game Companies
Spider-man 3 Video Game
Video Game Walkthroughs
Video Game Statistics
Call of Duty 4
More Answers
Amazon.com results for activision
Source
Activision Publishing, Inc. is an American video game publisher. It was founded on October 1, 1979 and was the world's first independent developer and distributor of video games for gaming consoles. Its first products were cartridges for the Atari 2600 video console system published from July 1980 for the US market and from August 1981 for the international market (UK). Activision is now one of the largest video game publishers in the world and was also the top publisher for 2... Read More »
Go to: Ask Encyclopedia · Images · Videos
Browse Article: History · Studios · Notable games published · Upcoming games · References ·
Source: Wikipedia
Related Questions:
•
Who was the Video game publisher of LOOM?
•
Who is developing the games for Activision and what have they done in the past? We hear the handheld versions of the game are different than the console versions. Care to enlighten us?
•
This game was created by "Activision" for the "Atari 2600". Up to four players could play at one time. Which one was it?
View more Q&A »
www.giantbomb.com/activision/3010-78/
Oct 9, 2014 ... Activision is the largest third-party publisher in the world. It became the first third- party developer for video game consoles, and is responsible ...
Explore More Answers About
Source: www.kgbanswers.com
About · Privacy · Terms · Careers · Ask Blog · Q&A · Mobile · Help · Feedback © 2014 Ask.com
**truncated
=============================================== ==============================
还有第二个脚本可以从上面的html文件中提取所有链接并将其放入另一个文件中。这是脚本:
=============================================== ==============================
use lib '/Users/lialin/perl5/lib/perl5';
use strict; use warnings;
use feature 'say';
use File::Slurp 'slurp'; # makes it
easy to read files.
use Mojo;
use Mojo::UserAgent;
use URI;
use File::Path qw(make_path remove_tree);
#my $html_file = shift @ARGV; # take file from command lin
my $calls_dir = "Ask/";
opendir(my $search_dir, $calls_dir) or die "$!\n";
my @html_files = grep /\.html$/i, readdir $search_dir;
closedir $search_dir;
#print "Got ", scalar @files, " files\n";
#my %seen = ();
foreach my $html_files (@html_files) {
my %seen = ();
my $current_file = $calls_dir . $html_files;
open my $FILE, '<', $current_file or die "$html_files: $!\n";
my $dom = Mojo::DOM->new(scalar slurp $calls_dir .$html_files);
print $calls_dir .$html_files ;
#for my $csshref ($dom->find('a[href]')->attr('href')->each) {
#for my $link ($dom->find('a[href]')->attr('href')->each) {
# print $1;
#say $1 #if $link->attr('href') =~ m{^https?://(.+?)/index\.php}s;
make_path('Ask/Parsed/Html/');
open my $fh, '>', "Ask/Parsed/Html/${html_files}.result.txt" or die $!;
for my $csshref ($dom->find('a[href]')->attr('href')->each) {
my $cssurl = URI->new($csshref)->abs($calls_dir .$html_files);
#open my $fh, '>', "Ask/${html_files}.result.txt" or die $!;
$fh->print("$html_files\n");
$fh->print("$cssurl\n");
#$fh->print("\t"."$_\n");
#print "$cssurl\n";
#print $file."\t"."$_\n";}}
=============================================== =====
生成的文件如下所示(再次使用Activision作为示例):
=============================================== ==============================
Activision_ask.html
http://www.ask.com/answers/browse? qsrc=167&q=Activision+video+game&qo=channelNavigation&o=0&l=dir
Activision_ask.html
http://www.ask.com/answers/browse?qsrc=167&q=Activision+video+game&o=0&l=dir#opensignin
Activision_ask.html
http://www.ask.com/answers/profile?qsrc=3099
Activision_ask.html
http://www.ask.com/answers/profile?qsrc=3099
Activision_ask.html
javascript:void(0);
Activision_ask.html
http://www.ask.com/advancedsearch? qsrc=167&q=Activision+video+game&qo=channelNavigation&o=0&l=dir
Activision_ask.html
http://www.ask.com/?o=0&l=dir&qsrc=14137
Activision_ask.html
http://www.ask.com/pictures?q=Activision+video+game&qsrc=167&qo=channelNavigation&o=0&l=dir
Activision_ask.html
http://www.ask.com/news?q=Activision+video+game&qsrc=167&qo=channelNavigation&o=0&l=dir
Activision_ask.html
http://www.ask.com/youtube?q=Activision+video+game&qsrc=167&qo=channelNavigation&o=0&l=dir
Activision_ask.html
http://www.ask.com/shopping?q=Activision+video+game&qsrc=167&qo=channelNavigation&o=0&l=dir
Activision_ask.html
javascript:void(0);
Activision_ask.html
http://www.ask.com/maps?q=Activision+video+game&qsrc=167&qo=channelNavigation&o=0&l=dir
Activision_ask.html
javascript:void(0);
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Cheats&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Tester&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Create+Your+Own+Video+Games&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=First+Video+Game+Invented&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Design&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Wii&qsrc=466&o=0&l=dir&qo=relatedSearchExpand
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Designer+Career&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Companies&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Spider-man+3+Video+Game&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Walkthroughs&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Video+Game+Statistics&qsrc=466&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/web?q=Call+of+Duty+4&qsrc=466&o=0&l=dir&qo=relatedSearchExpand
Activision_ask.html
http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field- keywords=activision&x=0&y=0&tag=askcom05-20
Activision_ask.html
http://www.amazon.com/Activision-Anthology-PlayStation- 2/dp/B00006Z7HQ%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00006Z7HQ
Activision_ask.html
http://www.amazon.com/Activision-Anthology-PlayStation-2/dp/B00006Z7HQ%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00006Z7HQ
Activision_ask.html
http://www.amazon.com/Destiny-Xbox-360/dp/B002I096Q4%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB002I096Q4
Activision_ask.html
http://www.amazon.com/Destiny-Xbox-360/dp/B002I096Q4%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB002I096Q4
Activision_ask.html
http://www.amazon.com/Skylanders-Trap-Team-Not-Machine-Specific/dp/B00NCA6ZT0%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00NCA6ZT0
Activision_ask.html
http://www.amazon.com/Skylanders-Trap-Team-Not-Machine-Specific/dp/B00NCA6ZT0%3Fpsc%3D1%26SubscriptionId%3D06KMPSHEDSXXQMQVT482%26tag%3Daskcom05-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB00NCA6ZT0
Activision_ask.html
http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=activision&x=0&y=0&tag=askcom05-20
Activision_ask.html
http://www.ask.com/wiki/Activision
Activision_ask.html
http://www.ask.com/wiki/Activision
Activision_ask.html
http://en.wikipedia.org/wiki/File:Activision.svg
Activision_ask.html
http://www.ask.com/allabout?q=video%20game%20publisher&qsrc=470
Activision_ask.html
http://www.ask.com/allabout?q=video%20game%20console&qsrc=470
Activision_ask.html
http://www.ask.com/allabout?q=Atari%202600&qsrc=470
Activision_ask.html
http://www.ask.com/wiki/Activision
Activision_ask.html
http://www.ask.com/wiki/Activision#Upcoming_games
Activision_ask.html
http://www.ask.com/wiki/Activision#References
Activision_ask.html
http://en.wikipedia.org/wiki/Activision
Activision_ask.html
http://www.ask.com/web?q=Who+was+the+Video+game+publisher+of+LOOM%3F&qsrc=469&o=0&l=dir&qo=relatedQuestions
Activision_ask.html
http://www.ask.com/web?q=Activision+video+game&qsrc=3060&o=0&l=dir
Activision_ask.html
http://www.activision.com/
Activision_ask.html
http://www.activision.com/games
Activision_ask.html
http://clk.about.com?zi=13/1tO&ity=boostOrg&o=0&ldid=4451&eng=boost&zu=http://vgstrategies.about.com/od/gameboycheatscodes/a/Activision-Anthology.htm
http://www.gametrailers.com/company/pou3yf/activision
Activision_ask.html
http://www.cnbc.com/id/102026893
Activision_ask.html
http://www.giantbomb.com/activision/3010-78/
Activision_ask.html
http://www.ask.com/web?q=History+of+Video+Game+Systems&qsrc=467&o=0&l=dir&qo=relatedSearchNarrow
Activision_ask.html
http://www.ask.com/mobile?&o=0&l=dir&qsrc=0
Activision_ask.html
http://help.ask.com
Activision_ask.html
http://feedback.ask.com
=============================================== ============================== 现在我正在处理一个最终脚本,该脚本将使用文件名和字符串的一部分来读取包含匹配或接近匹配文本的文件中的一行或多行。
在上面的示例中,我对&#39; http://www.activision.com/games&#39;感兴趣或基本上任何带有&#39; Activision&#39;从文件名和“游戏”这个词开始在它。
我的文件名明显非常大,文字游戏可能在文件名之前或之后。
我希望解释和代码能帮助其他人理解我想要实现的目标。
我现在遇到的问题是用于搜索字符串的regex命令。我正在努力降低其严格性,并且无法使匹配正常工作。
正如我之前提到的,我非常精通html和java,但我知道perl是正确的语言,显然不是专家(如果你看看我上面的代码)但是试着学习并完成我的任务。
答案 0 :(得分:2)
我不清楚你想做什么,但是给出了你的示例文件名
PieceIwanttomatch_don't_care_about_this.txt
假设您要查找前七个字符PieceIw
的所有文件,这些字符也以您要编写的.txt
结尾
if ( /^PieceIw.*\.txt$/ ) { ... }
我希望有帮助
<强>更新强>
好的,我想要你想要的是搜索目录中的所有.txt
个文件,查找包含文件名的前N个字符以及其他一些指定字符串的行。 / p>
如果你不知道哪个会首先出现 - 文件名前缀或另一个字符串 - 那么你就是双向前进的右边一行。一个改进是将字符串括在\Q...\E
中,它会转义所有非单词字符,以防止任何正则表达式元字符弄乱模式。
还请注意以下内容
我已使用autodie
,正如我在回答您之前的问题时所解释的那样。如果您在v5.10之前运行的是Perl版本并且无法升级,那么您将无法执行此操作并且必须单独检查每个文件操作的状态
对目录使用绝对路径非常重要;否则用户必须确保他们在运行程序之前拥有正确的当前工作目录
我已将所有参数都放到程序中 - 两个目录和要搜索的附加字符串 - 作为程序顶部的定义
我已使用glob
代替opendir
/ readdir
/ grep
,因为它更整洁,因此文件名称也是如此包括完整路径
use strict;
use warnings;
use 5.010;
use autodie;
use File::Path qw/ make_path remove_tree /;
use File::Basename qw/ fileparse /;
my $calls_dir = '/path/to/Ask/Parsed/Html';
my $parsed_dir = '/path/to/Ask/Parsed/Html2';
my $wanted = 'game';
my @files = glob "$calls_dir/*.txt";
printf "Got %d files\n", scalar @files;
for my $file (@files) {
open my $in_fh, '<', $file;
my $prefix = substr $file, 0, 8;
print $prefix, "\n";
my $basename = fileparse($file);
make_path($parsed_dir);
open my $out_fh, '>', "$parsed_dir/${basename}_parsed_for_management.txt";
while (<$in_fh>) {
print $out_fh $_ if / \Q$prefix\E .* \Q$wanted\E /x;
}
close $out_fh;
}
<强>更新强>
这很好用
my ($wanted, $prefix) = qw/ game nintendo /;
for ( 'game.nintendo.com/phoenix.zhtml?c=121127&p=irol-gom' ) {
print "OK\n" if / \Q$wanted\E .* \Q$prefix\E /x;
}
<强>输出强>
OK
答案 1 :(得分:0)
有些推测,试图在这里读取。
opendir(my $search_dir, $calls_dir) or die "$!\n";
my @files = grep /^${prefix}_/ grep /\.txt$/i, readdir $search_dir;
closedir $search_dir;
现在@files
仅包含名称以.txt
开头,后跟下划线的$prefix
个文件。您不希望搜索除这些之外的任何其他文件。我正在推测下划线,但你可以修改它以更好地满足你的需求,如果不是这样的话。
现在,搜索(仅)搜索匹配的文件。
for my $file (@files) {
my $current_file = $calls_dir . $file;
open my $FILE, '<', $current_file or die "$file: $!\n";
while (<$FILE>) {
print "$file\n$" if m/management/;
}
}
我实际上建议使用制表符或冒号分隔符,而不是文件名和匹配行之间的换行符。面向行的输出更容易使用。
当然,所有这些只是
grep management "$prefix"_*.txt >output
在一行shell脚本中。