这是否可以仅从HTML文件中提取所有文本?

时间:2011-08-21 16:42:46

标签: php extract

我正在考虑删除所有格式,所有文本就像你去网站的任何页面,用户Ctrl + A和Ctrl + C,然后使用Ctrl + V将所有内容粘贴到记事本中。你只会通过提取所有文本来理解我的意思。让我们使用网站来更好地解释: 这个网站: https://developer.palm.com/content/resources/develop/quick_start_ios.html

我想要的东西:

jump to navigation
jump to content

Showcase
Why webOS
The Opportunity
Innovative Platform
Cross-Platform
HP Reach
Vibrant Community
Showcase
Device Showcase
App Showcase
Developer Voices
My Apps
Resources
Design
Enyo Design Guide
Advanced Application Guidelines
webOS and Game Development
Development
Download the SDK
Enyo from the Ground Up
Enyo Tutorial
Third-party Tools
Developer Device Program
PDK Development
Unactivated Devices
Glossary
Distribution and Promotion
Distributing with HP
App Content Criteria
App Submission Checklist
International e-commerce FAQ
Submit Your Enyo App
Market Your App
Promo codes
In-App Purchase
FAQs
Developer Program FAQ
International e-commerce FAQ
PDK Technical FAQ
Videos
View All
Community
Connect
Forums
Developer Blog
Events
Twitter
IRC
RSS
Resources
Third-party Developers
webOS on github
Guide to Custom Feeds
webOS101 (external)
Community Sites
mobspot
Cyrket
PreCentral
webOS Roundup
Documentation
SDK Documentation
Index
Developer Guide
API Reference
Sign In Sign Up Search Form
Search   
HomeResourcesQuick Start iOS
Quick Start - iOS Developers
Print
Email
Share
If you've been developing for iOS® and are looking to expand your audience, we're here to help. Getting started with webOS is easy! If your current focus is OpenGL/SDL, then the transition will be simplicity itself. We have lots of great stories of developers porting their OpenGL apps very quickly. You can use the publicly available 3.0 SDK to do OpenGL/SDL development now with the included Plug-in Development Kit (PDK). Best of all, the PDK integrates nicely with  Xcode.

If your focus is web app development, you'll want to look at Enyo, our next-generation JavaScript framework, which is included in the 3.0 SDK.

Ready to get started?

Download the SDK
It's free! (While you're at it, sign up for the Developer Program.)

Try the Enyo tutorial or the OpenGL sample app
Choose the sample that's most appropriate for your skill set.

Check out our Resources pages
Get more information on developing for webOS. Or go straight to the Reference section to get all the details.
Quick Start Guide

iOS Developers
Web Developers
C/C++ Developers
Next Steps

Sign up!
Become a member of the webOS developer community
Watch Dev Day videos
See the talks from the NYC Dev Day
Find a Developer
Check out our list of third-party developers and designers
Support
We are here to help!
Why webOS
Business Case for webOS
Success Stories
App Showcase
Contact Us
Getting Started
Join the HP webOS Developer Program
Download the SDK/PDK
Developing Your First App
Videos
webOS CONNECT Events
MWC Developer Conference
NYC Developer Day
Podcasts
Support
Help
FAQs
Stay up to date
About RSS Feeds
Developer Blog
© 2011 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice. All screen images simulated. HP Pre 3 planned availability this summer.  Privacy Statement
Supported browsers: Firefox 3.6+; Google Chrome 10+; Safari 5+; Internet Explorer 8+
Palm.comLegal NoticesContact Us

4 个答案:

答案 0 :(得分:6)

这应该有效

<?php 

echo strip_tags(file_get_contents("https://developer.palm.com/content/resources/develop/quick_start_ios.html"));

这是一般的想法。您可以执行str_replace('<br/>', '\n', $output)之类的操作来更好地格式化。

答案 1 :(得分:3)

我使用lynx,从你的终端试试这个:

lynx -dump http://www.google.com

答案 2 :(得分:0)

另一种方法是检索页面正文标记的值:

$html = new DOMDocument();

$html->loadHTMLFile("https://developer.palm.com/content/resources/develop/quick_start_ios.html");

$body = $html->getElementsByTagName("body");
$body = $body->item(0);

echo $body->nodeValue;

答案 3 :(得分:0)

您可以使用文档树执行此操作,只需保留所有文本节点并删除所有元素节点。

您可以使用带有webkit的javascript或C ++实现此功能。