PHP将前两行文件读入变量并循环访问子文件夹

时间:2011-12-17 13:18:44

标签: php regex arrays markdown

我正在尝试使用PHP执行以下操作...

  1. 阅读目录
  2. 查找所有 .md .markdown 文件
  3. 阅读这些Markdown文件的前两行。
  4. 如果在第1行上找到Title: Title for the file,则将其添加到数组
  5. 如果在第2行上找到Description: Short description,则将其添加到数组
  6. 如果找到子目录,请重复步骤1-5
  7. 现在应该有一个不错的列表/数组
  8. 将此列表/数组打印到屏幕以显示如下....

  9. Directory 1 Name
    
    <a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 1 line 2
    
    <a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 2 line 2
    
    <a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 3 line 2
    
    Directory 2 Name
    
    <a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 1 line 2
    
    <a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 2 line 2
    
    <a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
    Description from Markdown FILE 3 line 2
    
    etc..........
    

    到目前为止的代码

    function getFilesFromDir($dir)
    {
        $files = array();
        //scan directory passsed into function
        if ($handle = opendir($dir)) {
            while (false !== ($file = readdir($handle))) {
    
                // If file is .md or .markdown continue
                if (preg_match('/\.(md|markdown)$/', $file)) {
    
                    // Grab first 2 lines of Markdown file
                    $content = file($dir . '/' . $file);
                    $title = $content[0];
                    $description = $content[1];
    
                    // If first 2 lines of Markdown file have a 
                    // "Title: file title" and "Description: file description" lines we then
                    // add these key/value pairs to the array for meta data
    
                    // Match Title line
                    $pattern = '/^(Title|Description):(.+)/';
                    if (preg_match($pattern, $title, $matched)) {
                        $title = trim($matched[2]);
                    }
    
                    // match Description line 
                    if (preg_match($pattern, $description, $matched)) {
                        $description = trim($matched[2]);
                    }
    
                    // Add .m and .markdown files and folder path to array
                    // Add captured Title and Description to array as well
                    $files[$dir][] = array("filepath" => $dir . '/' . $file,
                                           "title" => $title,
                                           "description" => $description
                                        );
    
                }
            }
            closedir($handle);
        }
    
        return $files;
    }
    

    用法

    $dir = 'mdfiles';
    $fileArray = getFilesFromDir($dir);
    

    需要帮助

    到目前为止,代码只需添加执行它在子目录上执行的操作的能力以及它与前两行代码匹配然后运行正则表达式2次的方式,可能会以不同的方式完成吗?

    我认为有一种更好的方法,以便我必须匹配标题和描述的REGEX只能运行一次?

    是否有人可以帮我修改以使此代码检测并在子目录上运行,以及改进读取markdown文件的前2行以获取标题和描述(如果存在)的方式?

    还需要帮助将阵列打印到屏幕上以使其不仅仅显示数据,我知道该怎么做但是必须打破文件以在每个集的顶部显示文件夹名称,就像在我的演示输出中一样上方。

    我感谢任何帮助

2 个答案:

答案 0 :(得分:2)

要递归迭代文件,RecursiveDirectoryIterator非常方便(相关:PHP recursive directory path)。它已经提供了对FileSystemObject的轻松访问,这在您的案例中看起来很有用,因为您希望获取文件内容。

此外,可以运行一个正则表达式来解析文件的前两行,因为当你更频繁地执行它们时模式会被缓存,它应该没问题。一种模式的好处是代码更加结构化,但模式更复杂的缺点。配置可能如下所示:

#
# configuration
#

$path = 'md';
$fileFilter = '~\.(md|markdown)$~';
$pattern = '~^(?:Title: (.*))?(?:(?:\r\n|\n)(?:Description: (.*)))?~u';

如果降价文件实际上是UTF-8编码,我添加了u - 修饰符(PCRE8)。

然后代码的处理部分使用$path上的递归目录迭代器,跳过与$fileFilter不匹配的文件,然后解析每个文件的前两行(如果文件至少是可读的)并且至少有一行)并将其存储到基于目录的散列表/数组$result

#
# main
#

# init result array (the nice one)
$result = array();

# recursive iterator for files
$iterator = new RecursiveIteratorIterator(
               new RecursiveDirectoryIterator($path, FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO), 
               RecursiveIteratorIterator::SELF_FIRST);

foreach($iterator as $path => $info)
{
    # filter out files that don't match
    if (!preg_match($fileFilter, $path)) continue;

    # get first two lines
    try
    {
        for
        (
            $maxLines = 2,
            $lines = '',
            $file = $info->openFile()
            ; 
            !$file->eof() && $maxLines--
            ; 
            $lines .= $file->fgets()
        );
        $lines = rtrim($lines, "\n");

        if (!strlen($lines)) # skip empty files 
            continue;
    }
    catch (RuntimeException $e)
    {
        continue; # files which are not readable are skipped.
    }

    # parse md file
    $r = preg_match($pattern, $lines, $matches);
    if (FALSE === $r)
    {
        throw new Exception('Regular expression failed.');
    }
    list(, $title, $description) = $matches + array('', '', '');

    # grow result array
    $result[dirname($path)][] = array($path, $title, $description);
}

剩下的是输出。由于哈希表是由目录哈希预先排序的,所以通过首先迭代目录然后遍历文件内的文件,它是相当直接的:

#
# output
#

$dirCounter = 0;
foreach ($result as $name => $dirs)
{
    printf("Directory %d %s\n", ++$dirCounter, basename($name));
    foreach ($dirs as $entry)
    {
        list($path, $title, $description) = $entry;
        printf("<a href='%s'>%s from line 1 of Markdown %s</a> <br>\n%s\n\n", 
                htmlspecialchars($path), 
                htmlspecialchars($title),               
                htmlspecialchars(basename($path)),
                htmlspecialchars($description)
              );
    }
}

答案 1 :(得分:1)

这应该有效:

if (preg_match('/\.(md|markdown)$/', $file)) {
   // ...
} elseif (is_dir($file)) {
    $files = array_merge($files, getFilesFromDir($dir . '/' . $file));
}

运行正则表达式两次并不是那么糟糕,并且可能比尝试在两行中散列一些东西更好。但是,您可以使用preg_replace

获得相同的结果
$title = trim(preg_replace('/^Title:(.+)/', '$1', $content[0]));
$description = trim(preg_replace('/^Description:(.+)/', '$1', $content[1]));

根据示例输出数组,这个:

foreach ($filesArray as $directory => $files) {
    echo $directory . "\n\n";

    foreach ($files as $fileData) {
        echo '<a href="' . $fileData['filepath'] . '">' . $fileData['title'] . "</a><br />\n";
        echo $fileData['description'] . "\n\n";
    }
}