我有一个“大”的csv文件(大约1GB的数据,3M行)可以导入到MariaDB表中。
事实是,几乎每一行的每个字段都被视为一个字符串。 所以,我必须将“1 337”(字符串)转换为1337(整数)。以下是用于导入表格的脚本:
LOAD DATA LOW_PRIORITY LOCAL
INFILE 'data.txt'
INTO TABLE `test`.`test_import`
CHARACTER SET utf8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
(`id`,
`data`,
@NumberOne,
@NumberTwo,
@NumberThree,
@NumberFour)
SET `Number One` = REPLACE(@NumberOne, ' ', ''),
`Number Two` = REPLACE(@NumberOne, ' ', ''),
`Number Three` = REPLACE(@NumberOne, ' ', ''),
`Number Four` = REPLACE(@NumberOne, ' ', '');
使用此脚本,导入999以下的数字没有问题。但是从1000开始(在我的csv中写为"1 000"
),我所有的都是警告(Truncated incorrect INTEGER value: '1 000'
)和值1我的数据库。
“有趣”的是,当我尝试这个时:
SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
`Number Two` = REPLACE(@NumberOne, '1', 'k'),
`Number Three` = REPLACE(@NumberOne, '1', 'k'),
`Number Four` = REPLACE(@NumberOne, '1', 'k')
- > REPLACE()工作,“1 000”变为“k 000”。
那么,我如何使用REPLACE()来删除数字中的空格?或者,如何使CAST()/ CONVERT()在像“1 337”这样的字符串上正常工作?
更多信息。
这是新鲜测试表:
CREATE OR REPLACE TABLE test_spaces_extr (
`Identifier` tinytext,
`First name` tinytext,
`Last name` tinytext,
`Number One` int unsigned,
`Number Two` int unsigned,
`Number Three` int unsigned,
`Number Four` int unsigned,
`Number Five` int unsigned,
`Number Six` int unsigned,
`Number Seven` int unsigned
);
以下是导入CSV的脚本:
LOAD DATA LOW_PRIORITY LOCAL
INFILE 'some_data.txt'
INTO TABLE `test`.`test_spaces_extr`
CHARACTER SET utf8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
(`Identifier`,
`First name`,
`Last name`,
@NumberOne,
@NumberTwo,
@NumberThree,
@NumberFour,
@NumberFive,
@NumberSix,
@NumberSeven)
SET `Number One` = REPLACE(@NumberOne, ' ', ''),
`Number Two` = REPLACE(@NumberTwo, ' ', ''),
`Number Three` = REPLACE(@NumberThree, ' ', ''),
`Number Four` = REPLACE(@NumberFour, ' ', ''),
`Number Five` = REPLACE(@NumberFive, ' ', ''),
`Number Six` = REPLACE(@NumberSix, ' ', ''),
`Number Seven` = REPLACE(@NumberSeven, ' ', '');
以下是some_data.txt
的全部内容:
"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
(一行,是的。)
结果如下:
"Identifier" "First name" "Last name" "Number One" "Number Two" "Number Three" "Number Four" "Number Five" "Number Six" "Number Seven"
"3efa639b3a" "Censored" "Censored" "7896" "3468" "3854" "5000" "1234" "9654" "0"
实际上,“数字”字段在这里变为整数。所有这些,但不是最后一个(“七号” - >“0”)。
它变得越来越怪......
答案 0 :(得分:1)
我无法重现这个问题:
MariaDB [(none)]> SELECT VERSION();
Field 1: `VERSION()`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: utf8_general_ci (33)
Length: 72
Max_length: 24
Decimals: 31
Flags: NOT_NULL
+-----------------+
| VERSION() |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field 1: `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 5
Max_length: 4
Decimals: 0
Flags: NOT_NULL UNSIGNED BINARY NUM
+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
| 1337 |
+---------------------------------------------+
1 row in set (0.00 sec)
/path/to/data.csv
<强>更新强>
档案:"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION() |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)
MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)
MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
-> `Identifier` tinytext,
-> `First name` tinytext,
-> `Last name` tinytext,
-> `Number One` int unsigned,
-> `Number Two` int unsigned,
-> `Number Three` int unsigned,
-> `Number Four` int unsigned,
-> `Number Five` int unsigned,
-> `Number Six` int unsigned,
-> `Number Seven` int unsigned
-> );
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
-> INTO TABLE `test_spaces_extr`
-> CHARACTER SET utf8
-> FIELDS TERMINATED BY ';'
-> OPTIONALLY ENCLOSED BY '"'
-> ESCAPED BY '"'
-> LINES TERMINATED BY '\r\n'
-> (
-> `Identifier`,
-> `First name`,
-> `Last name`,
-> @`NumberOne`,
-> @`NumberTwo`,
-> @`NumberThree`,
-> @`NumberFour`,
-> @`NumberFive`,
-> @`NumberSix`,
-> @`NumberSeven`
-> )
-> SET
-> `Number One` = REPLACE(@`NumberOne`, ' ', ''),
-> `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
-> `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
-> `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
-> `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
-> `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
-> `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
MariaDB [_]> SELECT
-> `Identifier`,
-> `First name`,
-> `Last name`,
-> `Number One`,
-> `Number Two`,
-> `Number Three`,
-> `Number Four`,
-> `Number Five`,
-> `Number Six`,
-> `Number Seven`
-> FROM
-> `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored | Censored | 7896 | 3468 | 3854 | 5000 | 1234 | 9654 | 1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)
$params = [
'index' => 'articles_v2',
'type' => 'article',
'size' => 50,
'body' => [
"sort" => [
[ "date" =>
["order" => "desc"]
],
],
"from" => $fromId,
"size" => $newsPerPage,
"query" => [
"bool" => [
"must" => [
[
"match_phrase_prefix" => [
"_all" => [
"query" => $search_phrase,
"operator" => "and",
"analyzer" => "analyzer_cs"
]
]
],
["terms" => [ "article.topics" => $topics ] ],
["range" => [ "article.date" => [ "from" => $date_from,"to" => $date_till]]]
]
]
]
]
];