MariaDB:从CSV导入时将字符串转换为int,同时删除数字中的空格

时间:2017-06-13 12:16:38

标签: mysql csv replace import mariadb

我有一个“大”的csv文件(大约1GB的数据,3M行)可以导入到MariaDB表中。

事实是,几乎每一行的每个字段都被视为一个字符串。 所以,我必须将“1 337”(字符串)转换为1337(整数)。

以下是用于导入表格的脚本:

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'data.txt'
    INTO TABLE `test`.`test_import`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`id`,
        `data`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberOne, ' ', ''),
            `Number Three` = REPLACE(@NumberOne, ' ', ''),
            `Number Four` = REPLACE(@NumberOne, ' ', '');

使用此脚本,导入999以下的数字没有问题。但是从1000开始(在我的csv中写为"1 000"),我所有的都是警告(Truncated incorrect INTEGER value: '1 000')和值1我的数据库。

“有趣”的是,当我尝试这个时:

SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
                `Number Two` = REPLACE(@NumberOne, '1', 'k'),
                `Number Three` = REPLACE(@NumberOne, '1', 'k'),
                `Number Four` = REPLACE(@NumberOne, '1', 'k')

- > REPLACE()工作,“1 000”变为“k 000”。

那么,我如何使用REPLACE()来删除数字中的空格?或者,如何使CAST()/ CONVERT()在像“1 337”这样的字符串上正常工作?

更多信息。

这是新鲜测试表:

CREATE OR REPLACE TABLE test_spaces_extr (
    `Identifier`   tinytext,
    `First name`   tinytext,
    `Last name`    tinytext,
    `Number One`   int unsigned,
    `Number Two`   int unsigned,
    `Number Three` int unsigned,
    `Number Four`  int unsigned,
    `Number Five`  int unsigned,
    `Number Six`   int unsigned,
    `Number Seven` int unsigned
);

以下是导入CSV的脚本:

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'some_data.txt'
    INTO TABLE `test`.`test_spaces_extr`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`Identifier`,
        `First name`,
        `Last name`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour,
        @NumberFive,
        @NumberSix,
        @NumberSeven)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberTwo, ' ', ''),
            `Number Three` = REPLACE(@NumberThree, ' ', ''),
            `Number Four` = REPLACE(@NumberFour, ' ', ''),
            `Number Five` = REPLACE(@NumberFive, ' ', ''),
            `Number Six` = REPLACE(@NumberSix, ' ', ''),
            `Number Seven` = REPLACE(@NumberSeven, ' ', '');

以下是some_data.txt的全部内容:

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"

(一行,是的。)

结果如下:

"Identifier"    "First name"    "Last name" "Number One"    "Number Two"    "Number Three"  "Number Four"   "Number Five"   "Number Six"    "Number Seven"
"3efa639b3a"    "Censored"  "Censored"  "7896"  "3468"  "3854"  "5000"  "1234"  "9654"  "0"

实际上,“数字”字段在这里变为整数。所有这些,但不是最后一个(“七号” - >“0”)。

它变得越来越怪......

1 个答案:

答案 0 :(得分:1)

我无法重现这个问题:

MariaDB [(none)]> SELECT VERSION();
Field   1:  `VERSION()`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     72
Max_length: 24
Decimals:   31
Flags:      NOT_NULL 


+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field   1:  `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     5
Max_length: 4
Decimals:   0
Flags:      NOT_NULL UNSIGNED BINARY NUM 


+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
|                                        1337 |
+---------------------------------------------+
1 row in set (0.00 sec)
/path/to/data.csv

<强>更新

档案:"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"

MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)

MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
    ->     `Identifier`   tinytext,
    ->     `First name`   tinytext,
    ->     `Last name`    tinytext,
    ->     `Number One`   int unsigned,
    ->     `Number Two`   int unsigned,
    ->     `Number Three` int unsigned,
    ->     `Number Four`  int unsigned,
    ->     `Number Five`  int unsigned,
    ->     `Number Six`   int unsigned,
    ->     `Number Seven` int unsigned
    -> );
Query OK, 0 rows affected (0.00 sec)

MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
    ->   INTO TABLE `test_spaces_extr`
    ->   CHARACTER SET utf8
    ->   FIELDS TERMINATED BY ';'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   ESCAPED BY '"'
    ->   LINES TERMINATED BY '\r\n'
    ->   (
    ->     `Identifier`,
    ->     `First name`,
    ->     `Last name`,
    ->     @`NumberOne`,
    ->     @`NumberTwo`,
    ->     @`NumberThree`,
    ->     @`NumberFour`,
    ->     @`NumberFive`,
    ->     @`NumberSix`,
    ->     @`NumberSeven`
    ->   )
    ->   SET
    ->   `Number One` = REPLACE(@`NumberOne`, ' ', ''),
    ->   `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
    ->   `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
    ->   `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
    ->   `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
    ->   `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
    ->   `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)                  
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

MariaDB [_]> SELECT
    ->   `Identifier`,
    ->   `First name`,
    ->   `Last name`,
    ->   `Number One`,
    ->   `Number Two`,
    ->   `Number Three`,
    ->   `Number Four`,
    ->   `Number Five`,
    ->   `Number Six`,
    ->   `Number Seven`
    -> FROM
    ->   `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored   | Censored  |       7896 |       3468 |         3854 |        5000 |        1234 |       9654 |         1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)
$params = [
    'index' => 'articles_v2',
    'type' => 'article',
    'size' => 50,
    'body' => [
        "sort"  => [
            [ "date"  =>
                ["order" => "desc"]
            ],
        ],
        "from" => $fromId,
        "size"  => $newsPerPage,
        "query" => [
            "bool" => [
                "must" => [
                    [
                        "match_phrase_prefix" => [
                            "_all" => [
                                "query" => $search_phrase,
                                "operator" => "and",
                                "analyzer" => "analyzer_cs"
                            ]
                        ]
                    ],
                    ["terms" => [ "article.topics" => $topics ] ],
                    ["range" => [ "article.date" => [ "from" => $date_from,"to" => $date_till]]]
                ]
            ]
        ]
    ]
];