我正在学习SQL,并且正在尝试使用一些示例数据来理解各种概念。
以下是Yahoo! Search Marketing Advertiser Bid-Impression-Click data on competing Keywords data的前几行:
1 08bade48-1081-488f-b459-6c75d75312ae 2 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 2.0 0.0
29 08bade48-1081-488f-b459-6c75d75312ae 3 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
29 08bade48-1081-488f-b459-6c75d75312ae 2 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
11 08bade48-1081-488f-b459-6c75d75312ae 1 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 2.0 0.0
76 08bade48-1081-488f-b459-6c75d75312ae 2 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
48 08bade48-1081-488f-b459-6c75d75312ae 3 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 2.0 0.0
97 08bade48-1081-488f-b459-6c75d75312ae 2 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
123 08bade48-1081-488f-b459-6c75d75312ae 5 769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a 100.0 1.0 0.0
119 08bade48-1081-488f-b459-6c75d75312ae 3 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
73 08bade48-1081-488f-b459-6c75d75312ae 1 2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a 100.0 1.0 0.0
请注意,此数据仅适用于Yahoo!根据要求。
以下是数据说明:
(1) "ydata-ysm-keyphrase-bid-imp-click-v1_0.gz" contains the following fields:
0 day
1 anonymized account_id
2 rank
3 anonymized keyphrase (expressed as list of anonymized keywords)
4 avg bid
5 impressions
6 clicks
Primary key of the data is a combination of fields date, account_id, rank and keyphrase. Average bid, impressions and
clicks information is aggregated over the primary key.
我正在尝试使用load data infile
语句将此数据从平面文件加载到MySQL表中。
一般情况下,我认为我了解如何指定列的数据类型,但我不确定3 anonymized keyphrase (expressed as list of anonymized keywords)
:
问题1:我应该将它们指定为单独的varchar
列,还是有一种数据类型允许它们一起存储为" list"类型?
以下是我目前必须创建用于保存此数据的表的查询。
# create a new database
create database webscopedata;
show databases;
use webscopedata;
# create the table
drop table bidders;
create table bidders (
daynum int,
account_id varchar(40),
rank int,
keyphrase1 varchar(100),
keyphrase2 varchar(100),
keyphrase3 varchar(100),
keyphrase4 varchar(100),
keyphrase5 varchar(100),
avg_bid double,
impressions double,
clicks double);
现在我尝试运行查询
load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "\t";
也就是说,当我指定制表符分隔符时,我收到错误:
Error Code: 1261. Row 1 doesn't contain data for all columns
这使我相信在指定制表符分隔符时字段没有正确分隔。所以我尝试使用查询指定多个分隔符:
load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "' '\t";
这里我试图使用多个字段终止符,但这似乎不起作用,我得到一个错误:
Error Code: 1265. Data truncated for column 'daynum' at row 1
问题2:如何使用多个分隔符指定解析此数据?
答案 0 :(得分:0)
您可以使用perl正则表达式重新格式化数据文件,然后尝试使用mysql加载数据功能导入它。
使用以下perl脚本重新格式化数据文件
#!/usr/bin/perl -w
use strict;
open(FH, "<", "ydata-ysm-keyphrase-bid-imp-click-v1_0");
my ($line, $data) = ();
while(<FH>){
$line = $_;
$line = qq($line);
$line=~s/(\s+)/"/g;
$line=~s/(\s*)"$//;
$data .= $line."\n";
}
close FH;
open(FH, ">", "DataFile");
print FH $data;
close FH;
然后在mysql中执行以下语句
加载数据infile&#34; pathToYourDataFile / DataFile&#34;进入表格投标人字段终止BY&#39;&#39;&#39;;
希望,这解决了将数据导入mysql的问题。