将表格中的文本转换为带有换行符的纯文本

时间:2012-10-07 17:28:12

标签: php simple-html-dom

如果有一大块HTML在<div><table>中很好地显示数据,那么如何删除所有HTML / CSS标记,同时保持最初在单个单元格中找到的文本和div现在只用换行?

此处显示的当前尝试将输出一个长连续段落,而不是在div或表格形式中保持分隔。

原始HTML: http://pastebin.com/63N3Kg16

输出

John Smith | SomeName Realty | (xxx) 939-4835 Allston St, Cambridge, MA Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 4BR/1BA Apartment $3,400/month Bedrooms 4 Bathrooms 1 full, 0 partial Sq Footage Unspecified Parking None Pet Policy No pets Deposit $0 DESCRIPTION Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 Posted: Sep 24, 2012, 6:55am PDT

PHP

nl2br(trim(strip_tags($html)));

预期输出

包含<br>或换行符,无<div><table> HTML标记的纯文本。基本上是为了使文本更具可读性,保持原始的间距/分隔结构,但除了<br>之外没有CSS样式或HTML标记。

John Smith | SomeName Realty | (xxx) 939-4835 

Allston St, Cambridge, MA 

Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 

4BR/1BA Apartment $3,400/month 

Bedrooms 4 
Bathrooms 1 full, 0 partial 
Sq Footage Unspecified 
Parking None 
Pet Policy No pets 
Deposit $0 

DESCRIPTION 
Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below 

Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 
Posted: Sep 24, 2012, 6:55am PDT

1 个答案:

答案 0 :(得分:1)

你可以玩一些字符串操作

尝试

$string = strip_tags($html);
$string = str_replace(chr(32).chr(32).chr(32),"*****",$string);
$newString = array_map(function($var){ return  trim(preg_replace('!\s+!', ' ',$var)); },explode("*****",$string));
print(implode("\n", $newString));

See Live Demo