我需要编写Qt / C ++代码来提取所有p标签,将每个p标签写入.txt文件,例如,如果我有以下HTML页面:
<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<p>My second paragraph.</p>
</body>
</html>
我需要代码来创建2 .txt文件,第一个文件将包含我的第一段。第二段将包括我的第二段。
我的问题如何解析html并获得标签之间的txt,这里是我的代码
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QEventLoop loop;
QNetworkRequest request;
request.setUrl(QUrl("http://en.wikipedia.org/wiki/Cars"));
QNetworkAccessManager* networkMgr = new QNetworkAccessManager();
QNetworkReply* reply = networkMgr->get(request);
QObject::connect(reply, SIGNAL(finished()),&loop,SLOT(quit()));
loop.exec();
QFile file ("/Users/David/Desktop/text123.txt");
file.open(QIODevice::WriteOnly);
file.write(reply->readAll());
delete reply;
return a.exec();
}
非常感谢你的帮助
答案 0 :(得分:1)
您可以使用QRegularExpression参见下面的示例。
QString txt = reply->readAll();
QRegularExpression regex("< *[pP] *>(.*)< *\\/ *[pP] *>");
QRegularExpressionMatchIterator it = regex.globalMatch(txt);
int i = 0;
while(it.hasNext())
{
QRegularExpressionMatch match = it.next();
QString filename = QString("e:/folder/file%1.txt").arg(i);
QFile file (filename);
file.open(QIODevice::WriteOnly);
file.write(match.captured(1).toUtf8());
file.close();
++i;
}