所以我有一个Twitter数据文件,看起来像这样
Robert_Aderholt^&^&^2013-06-12 18:32:02^&^&^RT @financialcmte: In 2012, the Obama Admin published 1,172 new regulations totaling 79,000 pages. 57 were expected to have costs of at...
Robert_Aderholt^&^&^2013-06-12 13:42:09^&^&^The Administration's idea of a 'recovery' is 4 million fewer private sector jobs than the average post WWII recovery http://t.co/gSVW0Q8MYK
Robert_Aderholt^&^&^2013-06-11 13:51:17^&^&^As manufacturing jobs continue to decrease, its time to open new markets #4Jobs http://t.co/X2Mswr1i43
(^& ^& ^单词是分隔符,我选择了该分隔符,因为它不太可能出现在任何推文中。)
此文件长度为90663行(我通过输入“wc -l tweets_parsed-6-12.csv”进行检查。)
然而,当我将它们加载到表中时,我只得到一个包含40456个条目的表:
mysql> source ../code/tweets2tables.sql;
Query OK, 0 rows affected (0.03 sec)
Query OK, 0 rows affected (0.08 sec)
Query OK, 40456 rows affected, 2962 warnings (0.81 sec)
Records: 40456 Deleted: 0 Skipped: 0 Warnings: 2962
mysql> SELECT COUNT(*) FROM tweets;
+----------+
| COUNT(*) |
+----------+
| 40456 |
+----------+
1 row in set (0.02 sec)
为什么?我删除了所有不包含^& ^& ^的行,所以我认为数据没有任何有趣的业务,但我可能是错的。
我的加载代码是
DROP TABLE IF EXISTS tweets;
CREATE TABLE tweets (
twitter_id VARCHAR(20),
post_date DATETIME,
body VARCHAR(140)
);
LOAD DATA
LOCAL INFILE 'tweets_parsed-6-12.csv'
INTO TABLE tweets
FIELDS TERMINATED BY '^&^&^'
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(twitter_id, post_date, body);
答案 0 :(得分:1)
未加载的行可能包含"
个字符。如果您指定字段以"
终止,那么其中的引号应该像这样转义 - ""
(双引号)。
OPTIONALLY
之前的ENCLOSED
关键字可能会有所帮助。