for循环内的Python re.search提供了误报。我该如何解决?

时间:2019-01-30 01:53:33

标签: python regex for-loop if-statement

我正在创建用于自动更新网站的代码,并且在使用代码来识别数据库中的标签和正确标记页面时,遇到了一个我不知道如何解决的错误。
我做了一个for循环来迭代.php的行,然后使用了if语句来查找标签。但是从输出来看,我的if语句两次响应。

首先,我检查了我的正则表达式是否给出了误报。使用文本编辑软件从代码中使用相同的正则表达式手动搜索,但仅找到一行。
然后我去检查一下re.compile和re.search的工作原理,但是在那儿我没做错什么。

这是代码的一部分。

        mydb = mysql.connector.connect(
        [Personal information redacted]
        )
        mycursor = mydb.cursor()
        local = input('Select directory.')
        for paths, dirs, files in os.walk(local):
            for f in files:
                print(f)
                if(splitext(f)[1] == ".php"):
                    print("found .php")
                    opened = open(local + f, 'r')
                    lines = opened.readlines()
                    date = splitext(f)[0]
                    flagD = re.compile(r'<!--desc.')
                    flagS = re.compile(r'<!--subject.')
                    flagE = re.compile(r'-->')
                    desc = None
                    subject = None
                    for l in lines:
                        if(flagD.search(l) != None):
                            print("found desc")
                            desc = re.sub(flagD, "",l)
                            descF = re.sub(flagE,"",desc)
                        if(flagS.search(l) != None):
                            print("found subj")
                            subject = re.sub(flagS, "",l)
                            subjectF = re.sub(flagE,"",subject)
                    if(desc == None or subject == None):
                        continue
                    sql = "INSERT INTO arquivos (quando, descricao, assunto, file) VALUES (%s, %s, %s, %s)"
                    val = (date, descF, subjectF, f)
                    mycursor.execute(sql, val)
                    mydb.commit()  

这是输出:

2018-11-15.php
found .php
2018-11-16.php
found .php
2018-11-26.php
found .php
2019-01-13.php
found .php
2019-01-15.php
found .php
2019-01-16.php
found .php
2019-01-17.php
found .php
2019-01-22.php
found .php
found desc
found subj
2019-01-24.php
found .php
found desc
found desc
found subj
found subj
BdUpdate.php
found .php
BdUpdate1.php
found .php
Comentarios.php
found .php
FINAL.php
found .php
Foot.inc
Formulario.php
found .php
FormularioCompleto.php
found .php
Head.inc
index.php
found .php
index1.php
found .php
Java.php
found .php
Layout Base - Copy.php
found .php
Layout Base.php
found .php
Php_Test.ste
Phyton.php
found .php
SalvandoDB.php
found .php
sidenav.inc
Side_Menu.php
found .php
Thema.php
found .php
Translations.php
found .php
Web.php
found .php
2019-01-13.php
found .php

如您所见,print("found desc")print("found subj")
在一个print("found .php")中被两次调用。这意味着它在我的代码中的某个地方产生了误报,但这是完全不可能的,因为我在其他软件中测试了此正则表达式。这是完全不希望的,并将其余代码作为条目保留在我的数据库中。
编辑:对不起,延迟。这是意外返回部分正在扫描的.php。

<!doctype html>
<!--desc Today I attempted to learn django, but a lot went wrong and I couldn't do it.-->
<!--subject:Java-->
<html>
<head>
<meta charset="utf-8">
<title>Training Diary</title>
<?php
// Establecer la zona horaria predeterminada a usar. Disponible desde PHP 5.1
date_default_timezone_set('Asia/Tokyo');
$pasta=date("F");
echo '<link rel = "stylesheet" type = "text/css" href = "';
echo "$pasta";
echo '/estilo.css"/>';
?>
<link href="January/estilo.css" rel="stylesheet" type="text/css">
</head>
<body>
<table width="100%" align="center" cellpadding="0" cellspacing="0" summary="Around Table">
<tr>
<td width="100%" height="100%" valign="top">
<!--HEADER -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" summary="Header">
<tr>
<td id="claro"><img src="img/spc.png" width="140" height="40" alt="space_Header">
</td>
<td width="100%" rowspan="2" align="center" valign="middle" id="claro">
<div id="banner"></div>
</td>
</tr>
<tr>
<td id="escuro"><img src="img/spc.png" width="140" height="20" alt="space_Header">
</td>
</tr>
</table>
</td>

</tr>   
<table width="100%" border="0" cellspacing="0" cellpadding="0" summary="Meio">
<tr>
<td height="100%" valign="top" id="escuro">
<base target="_top">
<div align="center" id="Side">
<table border="0" width="100%" cellspacing="1" cellpadding="0">
<tr>
<!--MENU MENU MENU MENU MENU MENU MENU MENU-->
<?php
$sql = "SELECT * FROM menu";
$con=mysqli_connect("localhost","root","","bdcomentarios");
$executar=mysqli_query($con, $sql);
while( $exibir = mysqli_fetch_array($executar)){
    echo '<td align="center" bordercolor="#2A628F" id="claro">';
    echo '<a href="';
    echo $exibir['assunto'];
    echo '.php" id="Side">';
    echo $exibir['assunto'];
    echo '</a>';
    echo '</td></tr><tr>';
}
mysqli_close($con)
?>
<!--FIM MENU FIM MENU FIM MENU FIM MENU FIM-->
</td>
</tr>
</table>
</div>
<img src="img/spc.png" width="140" height="1" alt="space_Meio">
</td>
<td width="100%" height="100%">
<table width="90%" border="0" cellspacing="0" cellpadding="0">
  <tr>
    <td align="center">
<h2> Creative ways to iterate 

</h2><br><p>Today I fixed 2 things.

<br>The first one, is that the methods of the classes that implements Pieces, was calling the Pieces' method,

<br>instead of their own. I took the method declaration, removed it and replaced it with an abstract method so the entire

<br>code does not glitch due to the absence of Move().

<br>Then, I noticed that the evaluation on notBlocked() on diagonal moves was wrong.

<br>Before, I was using nested for loops to iterate through the blocks it will move through.

<br>But as you may have noticed, that means that it will evaluate a square area instead of a diagonal line.

<br>So, I made a single for loop inside nested if statements that determine which angle it is moving on,

<br>Example: if(positionX > destinationX){if(positionY > destinationY)for(int i....) (this means it is moving down-left because both values are going down.)

<br>then made it return the piece on each square, and I expressed it with subtracting or adding the current loop number to the original position.

<br>Meaning, if you are on the second loop, and you want to see if there is a piece at 2 squares below AND left, it is x minus loop No. and y minus loop No.

<br>And by making different ways of iterating, I succeeded in correctly evaluating the bishop (and the queen's) movement.

<br>Now there is only 3 more unexpected returns to fix.

<br>More coming soon. <br><br><br>

    <h2>Visitors Comments, Thanks!</h2>
    <table width="50%" border="0" align="center" cellpadding="0" cellspacing="0"><tr><td>
<form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]); ?>" method="post" id"postcomments">
Name:(Show)<br>
<input type="CHAR" name="nome">
<br><br>E-Mail:(Hide)<br>
<input type="text" name= "email">
<br><br>Message:(Show)<br>
<textarea name="comentario"></textarea>
<br><br>
<INPUT TYPE="hidden" NAME="pagina" VALUE="<!--DATE-->">
<input type="submit" name="submit" value="Enviar">
<input type="reset" value="Limpar">
</form>
<hr>
</td></tr></table>
<?php
if(isset($_POST['submit'])){
$nome = "";
$email = "";
$comentario = "";
$pagina ="";
//keep the variables
if(isset($_POST["nome"]))
     $nome = $_POST["nome"];
if(isset($_POST["email"]))
     $email = $_POST["email"];
if(isset($_POST["comentario"]))
     $comentario = $_POST["comentario"];
if(isset($_POST["pagina"]))
     $pagina = $_POST["pagina"];

//current date
$date = date_default_timezone_set('Asia/Tokyo');
$data = date("Y/m/d");
$con=mysqli_connect("localhost","root","","bdcomentarios");

//EU COLOQUEI
if(isset($_POST["nome"],$_POST["pagina"], $_POST["email"], $_POST["comentario"], $_POST["data"]));

// Check connection
if (mysqli_connect_errno())
{
    echo "Failed to connect to MySQL: " . mysqli_connect_error();
}


$sql_insert="INSERT INTO tbcomentarios (data, nome, email, comentario, pagina) 
VALUES('$data', '$nome', '$email', '$comentario', '$pagina')";

//check the insert into DB
if (mysqli_query($con,$sql_insert)) {
echo '<script type="text/JavaScript">
alert("Sua mensagem foi gravada com sucesso. Obrigado");
location.href="<!--DATE-->.php";
</script>';
}
else {
     echo "Error: " . $sql . "<br>" . mysqli_error($con);
}



$sql = "SELECT * FROM tbcomentarios WHERE pagina like '%<!--DATE-->%' ORDER BY id desc";
$executar=mysqli_query($con, $sql);
while( $exibir = mysqli_fetch_array($executar)){
    echo $exibir['data'];
    echo "<br><b>Name:</b>";
    echo $exibir['nome'];
    echo "<br>";
    echo "<b>E-mail:</b>*********";
    echo "<br><b>Comment:</b><br>";
    echo $exibir['comentario'];
    echo "<br><hr>";
}
}
?>
<?php
$sql = "SELECT * FROM tbcomentarios WHERE pagina like '%<!--DATE-->%' ORDER BY id desc";
$con=mysqli_connect("localhost","root","","bdcomentarios");
$executar=mysqli_query($con, $sql);
while( $exibir = mysqli_fetch_array($executar)){
    echo '<table width="50%" border="0" align="center" cellpadding="0" cellspacing="0"><tr><td>';
    echo $exibir['data'];
    echo "<br><b>Name:</b>";
    echo $exibir['nome'];
    echo "<br>";
    echo "<b>E-mail:</b>*********";
    echo "<br><b>Comment:</b><br>";
    echo $exibir['comentario'];
    echo "<br><hr>";
    echo '</td></tr></table>';
}
?></td>
  </tr>
</table>

</td>
</tr>
</table>
<!--Foot -->
<table width="100%" border="0" cellspacing="0" cellpadding="0" summary="Foot">
<tr>
<td id="escuro""><img src="img/spc.png" width="140" height="30" alt="space_Foot">
</td>
<td width="100%" valign="bottom" id="escuro">
<div id="Foot">
<table align="center" cellpadding="3" cellspacing="1" summary="Foot Menu">
<tr>
<!--FOOT MENU FOOT MENU FOOT MENU FOOT MENU-->
<?php
$sql = "SELECT * FROM menu";
$con=mysqli_connect("localhost","root","","bdcomentarios");
$executar=mysqli_query($con, $sql);
while( $exibir = mysqli_fetch_array($executar)){
    echo '<td align="center" valign="bottom"><a href="';
    echo $exibir['assunto'];
    echo '.php">';
    echo $exibir['assunto'];
    echo '</a>';
    echo '</td>';
}
mysqli_close($con);
?>
</tr>
</table>
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>

另外,当我说我使用“其他软件”进行检查时,我的意思是我去了notepad ++,然后使用完全相同的正则表达式进行了正则表达式搜索,并且按预期运行。
我希望这些信息可以帮助您确定问题所在。
预先感谢。

PS。我的大多数问题都已关闭或锁定,没有人解释原因。我已经编辑了过去的问题以符合准则,但是问题很快就被掩埋了。请停下。

1 个答案:

答案 0 :(得分:0)

除了尝试一次解析每个文件一行之外,还可以提取两个文件 使用单个多行正则表达式的值。例如,以下内容显示了如何对单个对象进行操作 测试文件:

from os.path import splitext
import re

f = 'test.php'
re_desc_subject = re.compile(r'<!--desc.([^\r\n]*?)-->.*?<!--subject.([^\r\n]*?)-->', re.M + re.S + re.I)

with open(f) as f_input:
    data = f_input.read()

desc_subject = re_desc_subject.search(data)

if desc_subject:
    desc, subject = desc_subject.groups()
    print(desc, subject)

因此对于您的代码,这将按以下方式工作:

re_desc_subject = re.compile(r'<!--desc.([^\r\n]*?)-->.*?<!--subject.([^\r\n]*?)-->', re.M + re.S + re.I)

mydb = mysql.connector.connect(
[Personal information redacted]
)
mycursor = mydb.cursor()

local = input('Select directory.')

for paths, dirs, files in os.walk(local):
    for f in files:
        print(f)

        if splitext(f)[1] == ".php":
            print("found .php")

            with open(os.path.join(local, f)) as f_input:
                data = f_input.read()

            desc_subject = re_desc_subject.search(data)

            if desc_subject:
                desc, subject = desc_subject.groups()
                print("Found desc and subject")

                sql = "INSERT INTO arquivos (quando, descricao, assunto, file) VALUES (%s, %s, %s, %s)"
                val = (date, desc, subject, f)
                mycursor.execute(sql, val)
                mydb.commit()