尝试获取内容并将其保存为CSV格式,每行一个 但是当它有WEBLINK时,网址会搞砸。
查看最新的
实际上,glassdoor.com/Top-Companies-...部分是指向以下网络链接的超链接
重定向到
http://www.glassdoor.com/Top-Companies-for-Culture-and-Values-LST_KQ0,36.htm
**问题是,如果我们使用以下内容来保存**
TAG POS=1 TYPE=DIV ATTR=CLASS:dir-ltr EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Save.csv
它仅以下列方式保存为CSV(即TEXT)&链接未正确保存。
Honored to be named @Glassdoor's top company for culture and values. #jointheflock glassdoor.com/Top-Companies-...
**我们如何确保以CSV格式保存每个
的ACTUAL链接我觉得EVAL&因此可以使用Javascript命令,但我不知道如何。 我使用WINDOWS XP 64位与最新的Firefox Imacros插件 感谢
答案 0 :(得分:2)
查看iMacros wiki上的EXTRACT定义。您可以使用HREF作为提取类型来提取链接而不是该锚点的文本。以下示例提取链接并将其保存到文件中。
VERSION BUILD=8820413 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.penny-arcade.com/
TAG POS=1 TYPE=A ATTR=TXT:Forum EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=FORUMS.CSV
这是twitter页面的宏代码:
TAG POS=1 TYPE=A ATTR=CLASS:twitter-timeline-link EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=SaveTweets.csv
这是一个javascript版本,它会提取每条推文,如果存在则会链接它。
var retcode, tagText, tweetCounter, startIndex, endIndex, macro, extractMacro;
extractMacro = "";
macro = "CODE:";
macro += "URL GOTO=https://twitter.com/twitter\n";
retcode = iimPlay(macro);
tweetCounter = 1;
do
{
extractMacro = "CODE:";
macro = "CODE:";
macro += "TAG POS=" + tweetCounter + " TYPE=P ATTR=CLASS:ProfileTweet-text<SP>js-tweet-text<SP>u-dir EXTRACT=TXT\n";
retcode = iimPlay(macro);
tagText = iimGetLastExtract();
// iMacros code requires <SP> for spaces
tagText = tagText.replace(/[ \s\t\n]/g, "<SP>");
// Add extracted value to another macro for extraction later
extractMacro += "ADD !EXTRACT " + tagText + "\n";
macro = "CODE:";
macro += "TAG POS=" + tweetCounter + " TYPE=DIV ATTR=CLASS:ProfileTweet-Contents EXTRACT=HTM\n";
retcode = iimPlay(macro);
tagHTML = iimGetLastExtract();
tweetCounter++;
startIndex = 0;
do
{
startIndex = tagHTML.indexOf("data-expanded-url=", startIndex + 1);
endIndex = tagHTML.indexOf(" ", startIndex);
if (startIndex > 0)
{
var linkText = tagHTML.substring(startIndex + 'data-expanded-url="'.length, endIndex - 1);
// iMacros code requires <SP> for spaces
linkText = linkText.replace(/[ \s\t\n]/g, "<SP>");
extractMacro += "ADD !EXTRACT " + linkText + "\n";
}
} while (startIndex > 0);
// Save extracted data
extractMacro += "SAVEAS TYPE=EXTRACT FOLDER=* FILE=SaveTweets.csv\n";
retcode = iimPlay(extractMacro);
}
while (tagText !== "#EANF#");