Question

我正在处理图像并将其转换为约400个数据值。我希望将这些值中的每一个存储在一列中。我的mysql表有这样的列：

MYID, WIDTH, HEIGHT,P1,P2,P3.....P400.

我可以轻松地将它们保存到一个csv文件中，但是由于处理过程大约发生在300万个文件中，我想我会将这些输出直接写到mysql表中，而不是创建多个csv文件。

这是我到目前为止写的：

for (i, imagePath) in enumerate(imagePaths):
    filename = imagePath[imagePath.rfind("/") + 1:]
    image = cv2.imread(imagePath)
    rows, cols, channels = image.shape
    if not image is None:
        features = detail.describe(image)
        features = [str(x) for x in features]
        fileparam = [filename,cols,rows]
        sqldata = fileparam+features
        var_string = ', '.join('?' * len(sqldata))
        query_string = 'INSERT INTO lastoneweeknew VALUES (%s)' % var_string
        y.execute(query_string, sqldata)

如果我打印sqldata，它会像这样打印：

['120546506.jpg',650, 420, '0.0', '0.010269055',........., '0.8539078']

mysql表具有以下数据类型：

+----------+----------------+------+-----+---------+----------------+
| Field    | Type           | Null | Key | Default | Extra          |
+----------+----------------+------+-----+---------+----------------+
| image_id | int(11)        | NO   | PRI | NULL    | auto_increment |
| MYID     | int(10)        | YES  |     | NULL    |                |
| WIDTH    | decimal(6,2)   | YES  | MUL | NULL    |                |
| HEIGHT   | decimal(6,2)   | YES  | MUL | NULL    |                |
| P1       | decimal(22,20) | YES  |     | NULL    |                |
| P2       | decimal(22,20) | YES  |     | NULL    |                |

当我将数据插入mysql表时，出现以下错误：

TypeError: not all arguments converted during string formatting

但是，当我将输出写入csv文件并使用R将csv数据插入mysql时，我可以轻松插入。

我认为行和列的值是整数，其余的看起来像输出中的文本，因此我将它们转换为文本。

row = str(rows)
col = str(cols)

但是我仍然遇到相同的错误。

Answer 1

对于您的错误-％s只能用于格式化字符串参数，但是您的某些参数是int类型-因此类型错误。

您似乎正在尝试构建数据框并将其上传到MySQL数据库-幸运的是，这是一项常见的任务，因此有一个名为pandas的库可以为您完成所有这些工作。如果您创建字典列表，其中每个字典的键值对都是ColumnName：Value。

import pandas as pd
from pandas.io import sql
import MySQLdb

def handlePaths(imagePaths):
    imageDataList = []
    for (i, imagePath) in enumerate(imagePaths):
        filename = imagePath[imagePath.rfind("/") + 1:]
        image = cv2.imread(imagePath)
        rows, cols, channels = image.shape
        if not image is None:
            features = detail.describe(image)
            features = [str(x) for x in features]
            fileparam = [filename,cols,rows]
            sqldata = fileparam+features
            imageData = {"MYID" : value,
             "WIDTH" : value,
             "HEIGHT": value,
             "P1": value, #I would do these iterivly 
             .....,
             "P400": value}
            imageDataList.append(imageData)
    imageDataFrame = pd.DataFrame(imageDataList)
    database_connection = MySQLdb.connect()  # may need to add some other options to connect
    imageDataFrame.to_sql(con=database_connection, name='lastoneweeknew', if_exists='replace')

我认为这是一个非常消耗CPU的过程，您可以为每个CPU分配一个映像，以使其运行更快。通过上传每个单独的条目，您可以让数据库处理竞争条件。

import pandas as pd
from pandas.io import sql
import MySQLdb
import multiprocessing

def analyzeImages(imagePaths) #imagePaths is a list of image paths
    pool = multiprocessing.Pool(cpu_count)
    pool.map(handleSinglePath, imagePaths)
    pool.join()
    pool.close()

def handleSinglePath(imagePath):
    image = cv2.imread(imagePath) #Not sure what you where doing before here but you can do it again 
    rows, cols, channels = image.shape
    if not image is None:
        features = detail.describe(image)
        features = [str(x) for x in features]
        fileparam = [filename,cols,rows]
        sqldata = fileparam+features
        imageData = {"MYID" : value,
         "WIDTH" : value,
         "HEIGHT": value,
         "P1": value, #I would do these iterivly 
         .....,
         "P400": value}
    imageDataFrame = pd.DataFrame(imageData)
    database_connection = MySQLdb.connect()  # may need to add some other options to connect
    imageDataFrame.to_sql(con=database_connection, name='lastoneweeknew', if_exists='replace')

Python将数组元素插入MySQL数据库

1 个答案: