使用PHP从文本文件中提取信息

时间:2015-03-18 11:50:34

标签: php html html-table text-files fopen

问题:

使用PHP基于以下结构从文本文件中提取信息:

  • 日期(格式为YYYY-MM-DD)
  • 标题
  • 文字:价值
  • 文字:价值
  • 文字:价值

输入:

2015-03-18
 Store A
Text 1: 5,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
 Store B
Text 1: 10,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
 Store C
Text 1: 15,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12
2015-03-19
 Store D
Text 1: 20,00 USD
Text 2: 2015-03-18
Text 3: 2015-03-12

PHP代码(到目前为止):

<?php
    // Creates array to store data from textfile
    $data       = array();

    // Opens text file
    $text_file  = fopen('data.txt', 'r');

    // Loops through each line
    while ($line = fgets($text_file))
    {
        // Checks whether line is a date
        if (preg_match("/^[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1])$/", trim($line)))
        {
            $data[$line] = array();
        }
        else
        {
            $data[] = trim($line);
        }
    }

    // Removes first array key
    $data = array_slice($data, 1);

    // Prints out full array
    echo "<xmp>" . print_r($data, true) . "</xmp>";
 ?>

HTML代码:

<table border="1">
  <tr>
    <th>Date</th>
    <th>Store</th>
    <th>Text 1</th>
    <th>Text 2</th>
    <th>Text 3</th>
  </tr>
  <tr>
    <td>2015-03-18</td>
    <td>Store A</td>
    <td>5,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td></td>
    <td>Store B</td>
    <td>10,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td></td>
    <td>Store C</td>
    <td>15,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
  <tr>
    <td>2015-03-19</td>
    <td>Store D</td>
    <td>20,00 USD</td>
    <td>2015-03-18</td>
    <td>2015-03-12</td>
  </tr>
</table>

期望的输出:

enter image description here

问题:

  1. 提取和存储不同内容的适当方法是什么 值?
  2. 打印信息的适当方法是什么 作为输出示例?

1 个答案:

答案 0 :(得分:1)

我对源文件中的“组”记录感兴趣。

日期组 - 由一行只显示日期

表示
  • 商店组 - 包含..
  • 商店名称
  • 价格
  • 一组日期

已添加要求:仅打印当前日期和转发的商店组?我将在代码中将其称为“cutoff_date”。

我使用'预读'技术,因此总会有一个记录要处理

我提供帮助'识别事物'的功能。它们被使用,因此更容易看到控制的逻辑。

代码:

<?php // https://stackoverflow.com/questions/29121286/extract-information-from-text-file-using-php

/**
 * We need to only show store entries on or after a certain date
 * i call this the 'cutoff_date'.
 *
 * It will default to todays date
 */
$now = new DateTime();
$CUTOFF_DATE = $now->format('Y-m-d');

// output stored in here
$outHtml = '<table border="1">
  <tr>
    <th>Date</th>
    <th>Store</th>
    <th>Text 1</th>
    <th>Text 2</th>
    <th>Text 3</th>
  </tr>';


// source - we use 'read-ahead' as it makes life easier
$sourceFile = fopen(__DIR__ . '/Q29121286.txt', 'rb');

$currentLine = readNextLine($sourceFile); // read-ahead

while (!empty($currentLine)) { // process until eof...

    // start of a date group...
    $currentGroupDate = $currentLine; // ignore this group if less than CUTOFF_DATE
    $currentLine = readNextLine($sourceFile); // read ahead

    while (!empty($currentGroupDate) && $currentGroupDate < $CUTOFF_DATE) { // find next date_group record
        while (!empty($currentLine) && datePosition($currentLine) !== 0) { // read to end of current group
            $currentLine = readNextLine($sourceFile);
        }
        $currentGroupDate = $currentLine;
        $currentLine = readNextLine($sourceFile); // read ahead
   }

    $htmlCurrentDate = $currentGroupDate; // only print the date once

    $html = '';
    // display all the rows for this 'date group' -- look for next 'date'
    while (!empty($currentLine) && datePosition($currentLine) !== 0) {

        $html = '<tr>';

        $html .= '<td>'. $htmlCurrentDate .'</td>';
        $htmlCurrentDate = ''; // only display the date once

        $html .= '<td>'. $currentLine .'</td>'; // store
        $currentLine = readNextLine($sourceFile);

        // process the price
         $lineParts = explode(':', $currentLine); // need the price...
         $html .= '<td>'. $lineParts[1] .'</td>';
         $currentLine = readNextLine($sourceFile);

        // now process the group of dates - look for a line
        // that starts with 'text' and must contain a date
        while (   !empty($currentLine)
                && isTextLine($currentLine)
                && datePosition($currentLine) >= 1) {

            $lineParts = explode(':', $currentLine); // need the date...
            $html .= '<td>'. $lineParts[1] .'</td>';
            $currentLine = readNextLine($sourceFile); // read next
        }

        // end of this group...
        $html .= '</tr>';

        $outHtml .= $html;

    } // end of 'dateGroup'
} // end of data file...

$outHtml .= '</table>';
fclose($sourceFile);


// display output
echo $outHtml;
exit;

/**
 * These routines hide the low-level processing;
 */

/**
 * Return position of date string - will be -1 if not found
 * @param type $line
 * @return integer
 */
function datePosition($line)
{
    $result = preg_match("/\d{4}-\d{2}-\d{2}/", $line, $matches, PREG_OFFSET_CAPTURE);
    $pos = -1;
    if (!empty($matches)) {
        $match = current($matches);
        $pos = $match[1];
    }
    return $pos;
}

/**
 * return whether line is a text line
 *
 * @param type $text
 * @return type
 */
function isTextLine($text)
{
    return strpos(strtolower($text), 'text') === 0;
}

/**
 * return trimmed string or an empty string at eof
 * Added 'fudge' to not read passed the eof - ;-/
 * @param type $handle
 * @return string
 */
function readNextLine($handle)
{
    static $isEOF = false;

    if ($isEOF) {
        return '';
    }

    $line = fgets($handle);
    if ($line !== false) {
        $line = trim($line);
    }
    else {
        $isEOF = true;
        $line = '';
    }
    return $line;
}

提供的文件的原始输出:

| Date       | Store   | Text 1    | Text 2     | Text 3     |
|------------|---------|-----------|------------|------------|
| 2015-03-18 | Store A | 5,00 USD  | 2015-03-18 | 2015-03-12 |
|            | Store B | 10,00 USD | 2015-03-18 | 2015-03-12 |
|            | Store C | 15,00 USD | 2015-03-18 | 2015-03-12 |
| 2015-03-19 | Store D | 20,00 USD | 2015-03-18 | 2015-03-12 |