所以我从twitter下载了这个评论文件,但问题是post_by,comment_id,comment_by是如此巨大的数字,我无法在这个csv文件中读入mysql表。
post_by post_text post_published comment_id comment_by is_reply comment_message likes
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836574629735003 917fb960da76cc0db0c173d6f01b990a9ac01ed2 0 blah…. 1
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836574946401638 917fb960da76cc0db0c173d6f01b990a9ac01ed2 0 blah…. 1
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836575449734921 917fb960da76cc0db0c173d6f01b990a9ac01ed2 1 blah…. 1
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836577599734706 bf59a99b63d9211fcb0a3a1dc1a23cfb8b4cd4d9 1 blah…. 1
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836578463067953 917fb960da76cc0db0c173d6f01b990a9ac01ed2 1 blah…. 0
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836575076401625 bf59a99b63d9211fcb0a3a1dc1a23cfb8b4cd4d9 0 blah…. 0
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836576289734837 5c6bf068fe86fd029268a16875185f715bb7bda1 0 blah…. 0
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836577079734758 917fb960da76cc0db0c173d6f01b990a9ac01ed2 1 blah…. 1
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836577333068066 5c6bf068fe86fd029268a16875185f715bb7bda1 1 blah…. 0
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836576926401440 5c6bf068fe86fd029268a16875185f715bb7bda1 0 blah…. 0
917fb960da76cc0db0c173d6f01b990a9ac01ed2 Blah blah 2015-01-30T23:19:39+0000 836574313068368_836580769734389 917fb960da76cc0db0c173d6f01b990a9ac01ed2 0 blah…. 3
我使用的代码是抛出错误,因为日期/时间戳被读作主键,我没有任何关于它为什么会发生的线索。
mysql> CREATE TABLE test (
-> id INT(50) UNSIGNED ,
-> post_id INT(50) UNSIGNED ,
-> post_by INT(50) UNSIGNED,
-> post_text LONGTEXT,
-> post_published TIMESTAMP,
-> comment_id INT(50) PRIMARY KEY,
-> comment_by INT(50),
-> is_reply INT (2),
-> comment_message LONGTEXT,
-> comment_published TIMESTAMP,
-> comment_like_count INT(10)
-> );
Query OK, 0 rows affected (0.12 sec)
mysql> load data infile 'd:/test/twitter.csv' into table test fie
lds terminated by ',' lines terminated by '\n' (id,post_id, post_by, post_publis
hed,comment_id, comment_by, is_reply,comment_message,comment_published, comment_
like_count);
ERROR 1062 (23000): Duplicate entry '2015' for key 'PRIMARY'
mysql> drop table tesc;
Query OK, 0 rows affected (0.02 sec)
我面临的其他问题: -
1。)如何处理这些ID之间的“ - ”/连字符? 2.)如何保存这些巨大的ID?