将平面文件中的数据加载到MySQL表中

时间:2015-01-17 11:26:27

标签: mysql sql

我正在学习SQL,并且正在尝试使用一些示例数据来理解各种概念。

以下是Yahoo! Search Marketing Advertiser Bid-Impression-Click data on competing Keywords data的前几行:

1   08bade48-1081-488f-b459-6c75d75312ae    2   2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a    100.0   2.0 0.0
29  08bade48-1081-488f-b459-6c75d75312ae    3   769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a    100.0   1.0 0.0
29  08bade48-1081-488f-b459-6c75d75312ae    2   769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a    100.0   1.0 0.0
11  08bade48-1081-488f-b459-6c75d75312ae    1   769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a    100.0   2.0 0.0
76  08bade48-1081-488f-b459-6c75d75312ae    2   769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a    100.0   1.0 0.0
48  08bade48-1081-488f-b459-6c75d75312ae    3   2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a    100.0   2.0 0.0
97  08bade48-1081-488f-b459-6c75d75312ae    2   2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a    100.0   1.0 0.0
123 08bade48-1081-488f-b459-6c75d75312ae    5   769ed4a87b5010f4 3d4b990abb0867c8 cd74a8342d25d090 ab9f74ae002e80ff af26d27737af376a    100.0   1.0 0.0
119 08bade48-1081-488f-b459-6c75d75312ae    3   2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a    100.0   1.0 0.0
73  08bade48-1081-488f-b459-6c75d75312ae    1   2affa525151b6c51 79021a2e2c836c1a 327e089362aac70c fca90e7f73f3c8ef af26d27737af376a    100.0   1.0 0.0

请注意,此数据仅适用于Yahoo!根据要求。

以下是数据说明:

(1) "ydata-ysm-keyphrase-bid-imp-click-v1_0.gz" contains the following fields:

    0 day
    1 anonymized account_id
    2 rank
    3 anonymized keyphrase (expressed as list of anonymized keywords)
    4 avg bid
    5 impressions
    6 clicks

    Primary key of the data is a combination of fields date, account_id, rank and keyphrase. Average bid, impressions and
    clicks information is aggregated over the primary key.

我正在尝试使用load data infile语句将此数据从平面文件加载到MySQL表中。

一般情况下,我认为我了解如何指定列的数据类型,但我不确定3 anonymized keyphrase (expressed as list of anonymized keywords)
问题1:我应该将它们指定为单独的varchar列,还是有一种数据类型允许它们一起存储为" list"类型?


以下是我目前必须创建用于保存此数据的表的查询。

# create a new database
create database webscopedata;
show databases;
use webscopedata;

# create the table
drop table bidders;
create table bidders (
    daynum int, 
    account_id varchar(40), 
    rank int, 
    keyphrase1 varchar(100),
    keyphrase2 varchar(100),
    keyphrase3 varchar(100),
    keyphrase4 varchar(100),
    keyphrase5 varchar(100), 
    avg_bid double, 
    impressions double, 
    clicks double);

现在我尝试运行查询

load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "\t";

也就是说,当我指定制表符分隔符时,我收到错误:

Error Code: 1261. Row 1 doesn't contain data for all columns

这使我相信在指定制表符分隔符时字段没有正确分隔。所以我尝试使用查询指定多个分隔符:

load data infile "ydata-ysm-keyphrase-bid-imp-click-v1_0" into table bidders fields terminated by "' '\t";

这里我试图使用多个字段终止符,但这似乎不起作用,我得到一个错误:
Error Code: 1265. Data truncated for column 'daynum' at row 1

问题2:如何使用多个分隔符指定解析此数据?

1 个答案:

答案 0 :(得分:0)

您可以使用perl正则表达式重新格式化数据文件,然后尝试使用mysql加载数据功能导入它。

使用以下perl脚本重新格式化数据文件

#!/usr/bin/perl -w
use strict;
open(FH, "<", "ydata-ysm-keyphrase-bid-imp-click-v1_0");
my ($line, $data) = ();
while(<FH>){
$line = $_;
$line = qq($line);
$line=~s/(\s+)/"/g;
$line=~s/(\s*)"$//;
$data .= $line."\n";
}
close FH;

open(FH, ">", "DataFile");
print FH $data;
close FH;

然后在mysql中执行以下语句

加载数据infile&#34; pathToYourDataFile / DataFile&#34;进入表格投标人字段终止BY&#39;&#39;&#39;;

希望,这解决了将数据导入mysql的问题。