Question

我正在尝试执行以下操作，它甚至无法匹配一个好的案例，示例输入文件和下面给出的完整代码？为什么代码与下面的示例输入文件不匹配？如何克服它？

1.根据参数（whic

2.检查每个文件是否具有3行的版权信息，这3行不必开始3行

 Copyright (c) 2012 Company, Inc. 
 All Rights Reserved.
 Company Confidential and Proprietary.

示例输入文件： -

File1.txt

/*==========================================================================
 *
 *  @file:     Compiler.h
 *
 *  @brief:    This file 
 *
 *
 *  @author:   david
 *
 *  Copyright (c) 2012 Company, Inc. 
 *  All Rights Reserved.
 *  Company Confidential and Proprietary
 *
 *=========================================================================*/
#ifndef __COMPILER_ABSTRACT_H
#define __COMPILER_ABSTRACT_H

代码：

import os
import sys
userstring="Copyright (c) 2012 Company, Inc.\nAll Rights Reserved.\nCompany Confidential and Proprietary."
print len(sys.argv)
print sys.argv[1]
if len(sys.argv) < 2:
    sys.exit('Usage: python.py <build directory>')
for r,d,f in os.walk(sys.argv[1]):
    for files in f:
        with open(os.path.join(r, files), "r") as file:
            if ''.join(file.readlines()[:3]).strip() != userstring:
                print files

Answer 1

检查''.join(file.readlines()[:3]).strip()给你的内容。您会注意到行之间的*仍然存在，并且您将获得前3行（[:3]执行此操作），这在示例文件中肯定不是您想要的。虽然它们不在userstring。

一种可能的解决方案是自行检查每一行。像这样：

userlines = userstring.split('\n') # Separate the string into lines
with open(os.path.join(r, files), "r") as file:
    match = 0
    for line in file:
        if userlines[match] in line: # Check if the line at index `m` is in the user lines
            match += 1 # Next time check the following line
        elif match > 0: # If there was no match, reset the counter
            match = 0
        if match >= len(userlines): # If 3 consecutive lines match, then you found a match
            break
    if match == len(userlines): # You found a match
        print files

这背后的想法是你所寻找的并不完全匹配，因为有空行，*，点，空格等。我使用in运算符来计算更多或者对此更少，但是当你按线路工作时，你可以提出更灵活的东西。当您处理文件时更是如此......

<强>更新：

要在每一行上进行更高级的解析，您可以使用re包来使用正则表达式，但这在您的用例中可能不实用，因为您通常希望匹配字符串而不是模式。因此，要忽略最后一个字符，您可以尝试在开头或结尾删除/忽略任何（空格或点或星号）。

例如：

>>> a = '   This is a string.   '
>>> a.strip()
'This is a string.' # removes the whitespace by default
>>> a.strip('.')
'   This is a string.   ' # removes only dots
>>> a.strip('. ')
'This is a string' # removes dots and spaces

要使其与您的输入匹配，userstring我建议您以相同的方式处理两个字符串（即从两者中删除空格/点），除非您确定在{ {1}}。通过修改，你应该有类似的东西：

userstring

在按行处理文件后，您可以使用许多有用的功能，例如userlines = [s.strip('\n\r .') for s in userstring.split('\n')] # ... if userlines[match] == line.strip('\n\r .'): # ...，startswith，endswith，strip，{ {1}}，...只需在解释器中输入count即可获得完整列表。

解析目录中存在的文件中的用户字符串数据

1 个答案: