我知道这类问题的确经常被问到,但是其他所有答案对我来说都不是很有效。我有以下文本块:
"""
\n
\t\t\t\t\tÁrea útil\n
\t\t\t\t\t\n
\t\t\t\t\t\t\n
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t150 m²\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n
\t\t\t\t\t\n
\t\t\t\t
"""
我想忽略所有换行符和制表符(\n
和\t
),并将其他所有内容提取到数组中。因此,理想情况下,例如,将上面的文本块转换为
[
'Área útil',
'150m²',
]
编辑:这是我尝试过的一些示例:
(?!\n)(?!\t)[.]+
(?!\n)(?!\t)(.)+
(\r\n)+|\r+|\n+|\t+
^\w+$
EDIT2:抱歉,完全忘记提及该语言是PHP
答案 0 :(得分:0)
在PHP中,您可以执行以下操作:
<?php
$string = "\n
\t\t\t\t\tÁrea útil\n
\t\t\t\t\t\n
\t\t\t\t\t\t\n
\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t150 m²\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n
\t\t\t\t\t\n
\t\t\t\t";
// Get rid of the tabs
$string = preg_replace( '/(\t)/m', '', $string );
// Split on new lines
$array = preg_split( '/[\r\n]/m', $string );
// Loop the array and get rid of empty strings
foreach( $array as $k=>$v )
{
if( $v === '' )
{
unset( $array[ $k ] );
}
}
// Re-index the array
$array = array_values( $array );
var_dump( $array );
哪个输出:
array(2) {
[0]=>
string(11) "Área útil"
[1]=>
string(7) "150 m²"
}