使用python

时间:2017-06-25 17:47:50

标签: python mysql xml csv pubdate

我有一个关于在我的表mysql中插入标记pubDate的问题,实际上我试图将标记(title,link和pubDate)和最后一个标记(PubDate)放入我的表中。

我解释了代码:

  1. 第一步读取页面rss并编写一个xml文件

  2. 第二步生成一个只有3个标签的csv文件(title,link和pubDate) 注意:在此代码中,我需要使用:item.findtext('pubDate')因为如果我使用item.find('pubDate').text这会产生错误,尽管使用这两种情况都可以正确生成文件。

  3. 以及将文件csv存储到mysql中的表格的最后一步。

  4. 在此步骤中,我收到了下一个错误:

    Connected to pydev debugger (build 171.4694.38)
    Traceback (most recent call last):
    File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
    File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
    File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
    File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 78, in <module>
    main()
    File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 72, in main
    testdb()
    File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull", line 56, in testdb
    (r[1:] for r in csv_data.itertuples()))
    File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
    return self.execute(stmt)
    File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
    self._handle_result(self._connection.cmd_query(stmt))
    File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
    File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
    raise errors.get_exception(packet)
    mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'
    
    Process finished with exit code 1
    

    我认为这个问题出现在pubDate上,因为如果我分两部分运行程序:

    第一部分:

    创建xml和CSV但是将参数更改为pubDate:item.find('pubDate').text生成文件xml和csv成功但代码显示有关pubdate的错误。

    第二部分:

    从第一步创建的csv文件插入到mysql中。程序运行成功,没有错误。检查我的数据库并加载信息。

    但是在这个选项中,我不能在同一个文件中运行这两个程序,因为错误不允许继续,它不允许执行有关插入数据库的部分。

    然后错误实际上是关于这段代码:

    # Codigo Python que crea un XML CSV e inserta a una BD MYSQL.
    # Llamamos los modulos que necesitamos para ejecutar este script
    import csv
    import MySQLdb
    import requests
    import xml.etree.ElementTree as ET
    import mysql.connector
    import pandas as pd
    
    
    def loadRSS():
        # Configuramos la URL del rss de CNN
        url = 'http://rss.cnn.com/rss/edition.xml'
    
        # Creamos un objeto con el que vamos a obtener la url de la variable declarada hace un momento
        resp = requests.get(url)
    
        # Procedemos a guardar la informacion en un archivo llamado cnn.XML
        with open('cnn.xml', 'wb') as f:
            f.write(resp.content)
    
    
    def loadcsv():
        tree = ET.parse("cnn.xml")
        root = tree.getroot()
    
        d = open('cnn.csv', 'w')
    
        csvwriter = csv.writer(d)
    
        count = 0
    
        head = ['title', 'link', 'pubDate']
    
        csvwriter.writerow(head)
    
        for item in root.findall('./channel/item'):
            row = []
            title_name = item.find('title').text
            row.append(title_name)
            link_name = item.find('link').text
            row.append(link_name)
            pubDate_name = item.findtext('pubDate')
            row.append(pubDate_name)
            csvwriter.writerow(row)
        d.close()
    
    def testdb():
        cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
        cursor = cnx.cursor()
        csv_data = pd.read_csv('cnn.csv')
    
        for row in csv_data.iterrows():
            cursor.executemany(
                "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
                (r[1:] for r in csv_data.itertuples()))
    
        cnx.commit()
        cursor.close()
        cnx.close()
    
        #connection = MySQLdb.Connect(host='localhost', user='root', passwd='password', db='cnn')
        #cursor = connection.cursor()
        #query = "LOAD DATA INFILE 'cnn.csv' INTO TABLE noticias(title, link, pubdate)"
        #cursor.execute(query)
        #connection.commit()
    
    def main():
        # Inicializamos los modulos definidos en el programa.
        loadRSS()
        loadcsv()
        testdb()
    
    
    
    if __name__ == "__main__":
        # llamamos el metodo main
        main()
    

    有人确实知道这个错误。

    更新 我添加了一行:

    print(csv_data.head())
    

    添加您评论的输出,调试器的结果为:

    Connected to pydev debugger (build 171.4694.38)
                                                   title  \
    0  Bloodied and broken: The battle against ISIS i...   
    1                            The human cost of ISIS    
    2                  $1B deal to prop up UK government   
    3               Netanyahu freezes Western Wall plans   
    4  Only a 'couple of hundred' ISIS fighters left ...   
    
                                                    link  \
    0                              http://cnn.it/2sbE6fp   
    1  http://www.cnn.com/videos/world/2017/06/25/phi...   
    2  http://www.cnn.com/2017/06/26/europe/theresa-m...   
    3  http://www.cnn.com/2017/06/26/middleeast/weste...   
    4  http://www.cnn.com/2017/06/26/middleeast/coupl...   
    
                                date  
    0                            NaN  
    1  Mon, 26 Jun 2017 08:49:00 GMT  
    2  Mon, 26 Jun 2017 11:59:24 GMT  
    3  Mon, 26 Jun 2017 13:09:30 GMT  
    4  Mon, 26 Jun 2017 13:16:21 GMT  
    Traceback (most recent call last):
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
        globals = debugger.run(setup['file'], None, None, is_module)
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 80, in <module>
        main()
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 74, in main
        testdb()
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 58, in testdb
        (r[1:] for r in csv_data.itertuples()))
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 654, in executemany
        return self.execute(stmt)
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 551, in execute
        self._handle_result(self._connection.cmd_query(stmt))
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 490, in cmd_query
        result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\connection.py", line 395, in _handle_result
        raise errors.get_exception(packet)
    mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column 'nan' in 'field list'
    
    Process finished with exit code 1
    

    更新27/06/2017:

    我添加了testdb的一部分,现在就是这样:

    def testdb():
        cnx = mysql.connector.connect(user='root', password='password', host='localhost', database='cnn')
        cursor = cnx.cursor()
    
        with open('cnn.csv') as fh:
            cursor.executemany(
                "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
                [tuple(row) for row in csv.reader(fh)]
            )
    
        cnx.commit()
        cursor.close()
        cnx.close()
    

    当我对程序进行debbug时,错误是:

    Connected to pydev debugger (build 171.4694.38)
    Traceback (most recent call last):
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 75, in __call__
        return bytes(self.params[index])
    IndexError: tuple index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1591, in <module>
        globals = debugger.run(setup['file'], None, None, is_module)
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\pydevd.py", line 1018, in run
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 79, in <module>
        main()
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 73, in main
        testdb()
      File "C:/Users/SoriyAntony/PycharmProjects/cnnwithcvsanddb/cnnfull.py", line 56, in testdb
        [tuple(row) for row in csv.reader(fh)]
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 652, in executemany
        stmt = self._batch_insert(operation, seq_params)
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 594, in _batch_insert
        tmp = RE_PY_PARAM.sub(psub, tmp)
      File "C:\Users\SoriyAntony\AppData\Local\Programs\Python\Python36-32\lib\site-packages\mysql\connector\cursor.py", line 78, in __call__
        "Not enough parameters for the SQL statement")
    mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement
    
    Process finished with exit code 1
    

    我不知道是否忘记添加内容。

1 个答案:

答案 0 :(得分:0)

  

评论:...但现在错误是

第一个错误相关:IndexError: tuple index out of range
CSV数据必须是错误的,在传递给MySQL之前检查:

import csv
records = []
with open('test/cnn.csv') as fh:
    for row in csv.reader(fh):
        _tuple = tuple(row)
        if len(_tuple) == 3:
            records.append(_tuple)
        else:
            print('[FAIL]: Tupel Length not 3, found {} in {}'.format(len(_tuple), _tuple))

cursor.executemany("INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)", records)
  

评论:错误:并非所有参数都在SQL语句中使用   根据{{​​3}}:

data = [
  ('Jane', date(2005, 2, 12)),
  ('Joe', date(2006, 5, 23)),
  ('John', date(2010, 10, 3)),
]
stmt = "INSERT INTO employees (first_name, hire_date) VALUES (%s, %s)"
cursor.executemany(operation, seq_of_params)
     

seq_of_params 必须是元组列表

因此,您不需要for循环来迭代CSV行数据,您必须将整个CSV数据作为元组列表传递。第二次使用csv module而不是pandas。 改为:

import csv
with open('cnn.csv') as fh:
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        [tuple(row) for row in csv.reader(fh)]
    )

使用Python测试:3.4.2

  

问题:有人确实对此错误有所了解。

Unknown column 'nan' in 'field list'

这部分代码错误。您正在迭代csv_data两次。

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        (r[1:] for r in csv_data.itertuples()))

无法判断这是否会导致上述错误,但您应该更改为以下内容并重试以验证错误是否仍然存在:

for row in csv_data.iterrows():
    cursor.executemany(
        "INSERT INTO noticias(title, link, pubDate) VALUES(%s, %s, %s)",
        ((value for value in row[1]))