我有一个格式错误的文本文件,我想转换为csv。
以下是一个例子:
100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO
148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO
160032 NA/1-2014-158129 PAOLO GORINI SNC LODI MAG 6 2014 11:51AM DANNEGGIATO -10% 2 0 2 0 NO
54900 NA/1-2014-158070 STRADA VECCHIA CREMONESE SNC LODI MAG 6 2014 9:53AM DANNEGGIATO +10% 10 0 10 0 NO
100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO
147959 NA/1-2014-146140 DOSSENA SNC LODI GEN 3 2014 10:45AM DANNEGGIATO -10% 200 0 200 0 NO
大致就是这种形式:
[number] [id] [awfully formatted street] ['LODI'] [timestamp] [damaged or not] [percentage] [squaremeters] [squaremeters] [squaremeters] [squaremeters] [asbest-crumbled or not]
我的问题是如何提取第三部分,[格式错误的街道]。 基本上它是字符串['LODI']之前的[id]之后的字符串(但['LODI']必须在[timestamp]之前)
我应该用空格爆炸每个行,然后向后遍历数组,超过[timestamp],超越['LODI']并加入array [id]之前的值,即array [1]?或者是否有一种更聪明(优雅)的方法来做到这一点,也许是使用preg_match()?
感谢任何提示!
答案 0 :(得分:0)
这应该可以从一行中提取地址。
<?php
$row = "100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO";
$row_array = preg_split('/\s+/', $row);
array_shift($row_array);
array_shift($row_array);
for($i=0; $i<12; $i++){
array_pop($row_array);
}
$address = implode(" ", $row_array);
?>
答案 1 :(得分:0)
我认为爆炸不会在这里做。我建议使用 regexp 。对于Instance,如果您将 .txt 文件作为一个字符串读取(其中数据字符串用\ n分隔):
$f = fopen($fname="file.txt", "rt");
$str = fread($f, filesize($fname)));
fclose($f);
然后像这样使用preg_match_all()
:
$re = "/^(\\d+)\\s*(.*)(LODI)\\s*(.+(?:AM|PM))\\s*(\\w+)\\s+(-?\\d{1,3}%)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\w+)$/m";
preg_match_all($re, $str, $matches,PREG_SET_ORDER );
echo "<pre>\n";
print_r($matches);
echo "</pre>\n";
输出如下:
Array
(
[0] => Array
(
[0] => 100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO
[1] => 100910
[2] => NA/1-2013-99636 VIA DEI PESCATORI 2/A
[3] => LODI
[4] => APR 8 2013 4:24PM
[5] => DANNEGGIATO
[6] => -10%
[7] => 200
[8] => 2700
[9] => 0
[10] => 0
[11] => NO
)
[1] => Array
(
[0] => 148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO
[1] => 148013
[2] => NA/1-2014-146194 CAVALLOTTI SNC
[3] => LODI
[4] => GEN 3 2014 3:37PM
[5] => DANNEGGIATO
[6] => -10%
[7] => 0
[8] => 0
[9] => 2
[10] => 0
[11] => NO
)
..........// And so on
我使用了您在此示例中提供的文字。因此,在输出中,您可以将数据格式化为数组列表。所以你可以用它做任何你想做的事。 $ matches [$ i] [0] - 将存储整个匹配,所以只需跳过它并使用$ matches [$ i] [1] .... $匹配[$ i] [11]作为您的数据。
答案 2 :(得分:0)
def ordinal(number)
abs_number = number.to_i.abs
if (11..13).include?(abs_number % 100)
"th"
else
case abs_number % 10
when 1; "st"
when 2; "nd"
when 3; "rd"
else "th"
end
end
end
def ordinalize(number)
"#{number}#{ordinal(number)}"
end
结果是
<?php
// read file line by line
$line = '148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO';
//start by seperating the string on LODI
$lodi_split = explode('LODI', $line);
// Now split the first occ into an array on space
$bits = explode(' ', $lodi_split[0]);
$address = '';
// start reading occurance from occ 2 to loose the first 2 fields
for ($i=2; $i < count($bits); $i++ ) {
$address .= $bits[$i] . ' ';
}
echo $address . PHP_EOL;