MySQL不正确地导入包含空格的字段

时间:2019-05-05 20:42:47

标签: mysql

我使用R生成了CSV文件,但遇到一个问题,当我将其导入MySQL时,包含具有空格(即短语,句子)的字段的列会被弄乱。

这是我用来实例化表的代码:

CREATE TABLE testis_sQTL (
intron_cluster VARCHAR(40) NOT NULL,
chrom TINYINT(2) NOT NULL,
pheno_start INT(12) NOT NULL,
pheno_end INT(12) NOT NULL,
strand CHAR(1) NOT NULL,
variant_id VARCHAR(25) NOT NULL,
variant_chrom INT(2) NOT NULL,
var_start INT(12) NOT NULL,
var_end INT(12) NOT NULL,
p FLOAT(12) NOT NULL,
beta FLOAT(12) NOT NULL,
emp_p FLOAT(12) NOT NULL,
adj_p FLOAT(12) NOT NULL,
qval FLOAT(20) NOT NULL,
width INT(12) NOT NULL,
istrand CHAR(1) NOT NULL,
gene_id INT(9) NOT NULL,
symbol VARCHAR(12) NOT NULL,
gene_name VARCHAR(100) NOT NULL);

在这里,我尝试加载数据:

LOAD DATA LOCAL
  INFILE '/var/www/html/*****/FINAL/testis_sQTL.txt'
  INTO TABLE testis_sQTL
  CHARACTER SET 'utf8'
  FIELDS TERMINATED BY ','
  IGNORE 1 LINES;

我可以这样做而不会发出警告。但是,列gene_name如下所示:

+--------------------------------------------------------+
| gene_name                                              |
+--------------------------------------------------------+
              |scription complex subunit 2
                     | synthetase 1
              |scription complex subunit 2
 |methenyltetrahydrofolate synthetase domain containing
                      |e domains 2
                                 |
                                          |
                             |
                                          |
                |d apoptosis inhibitor 1
                                 |
                |through (NMD candidate)
                                  |
                |through (NMD candidate)
                                  |
                                       |
                             |
                |eacetylase pseudogene 1
                                  |
                              |
                         |or 35
 |methenyltetrahydrofolate synthetase domain containing
                     |ontaining 146
            |RND transporter family member 1
          |th sequence similarity 186 member B
            |emal light intermediate chain 1
                                         |
                     |y 15 member 4
 |methenyltetrahydrofolate synthetase domain containing
                            |
+--------------------------------------------------------+

什么时候看起来应该像这样:

[*******@bfx FINAL]$ awk -F',' '{print $19}' testis_sQTL.txt
gene_name
CCR4-NOT transcription complex subunit 2
2'-5'-oligoadenylate synthetase 1
CCR4-NOT transcription complex subunit 2
methenyltetrahydrofolate synthetase domain containing
CUB and Sushi multiple domains 2
CSMD2 antisense RNA 1
neuromedin B
transcription factor Dp-2
neuromedin B
cytokine induced apoptosis inhibitor 1
histone deacetylase 7
UBE2F-SCLY readthrough (NMD candidate)
selenocysteine lyase
UBE2F-SCLY readthrough (NMD candidate)
selenocysteine lyase
microRNA 548h-2
arylacetamide deacetylase
arylacetamide deacetylase pseudogene 1
succinate receptor 1
serpin family B member 6
G protein-coupled receptor 35
methenyltetrahydrofolate synthetase domain containing
coiled-coil domain containing 146
dispatched RND transporter family member 1
family with sequence similarity 186 member B
dynein axonemal light intermediate chain 1
ADAMTS like 3
solute carrier family 15 member 4
methenyltetrahydrofolate synthetase domain containing
two pore segment channel 1

我不知道为什么会这样。最初,我认为这与表的分离方式有关,因此我将字段定界符从\t切换为,,但似乎没有做任何事情。这让我特别困惑,因为我没有遇到任何错误。

编辑:这是CSV的一部分

[******@bfx FINAL]$ head testis_sQTL.txt
intron_cluster,chrom,pheno_start,pheno_end,strand,variant_id,variant_chrom,var_start,var_end,p,beta,emp_p,adj_p,qval,width,i.strand,gene_id,symbol,gene_name
12:70636673:70637092:clu_42156_NA,12,70636674,70637092,+,12_70636829_G_A_b37,12,70636829,70636829,3.06558e-18,-1.31573,0.000999001,2.3597e-14,4.17518937099935e-12,112000,+,4848,CNOT2,CCR4-NOT transcription complex subunit 2
12:113355505:113357194:clu_43113_NA,12,113355506,113357194,+,12_113361443_G_A_b37,12,113361443,113361443,1.84858e-15,-0.931698,0.000999001,2.45773e-13,3.74452720714286e-11,25252,+,4938,OAS1,2'-5'-oligoadenylate synthetase 1
12:70636673:70636846:clu_42156_NA,12,70636674,70636846,+,12_70438852_A_C_b37,12,70438852,70438852,3.99723e-15,1.17823,0.000999001,5.18063e-12,6.33582862902935e-10,112000,+,4848,CNOT2,CCR4-NOT transcription complex subunit 2
16:86581174:86581641:clu_50252_NA,16,86581175,86581641,+,16_86581191_G_A_b37,16,86581191,86581191,2.06227e-14,1.8007,0.000999001,3.59828e-11,3.84513478295858e-09,25060,-,64779,MTHFSD,methenyltetrahydrofolate synthetase domain containing
1:34336095:34336473:clu_30740_NA,1,34336096,34336473,+,1_34349815_C_A_b37,1,34349815,34349815,1.40127e-12,-0.863764,0.000999001,1.03633e-09,8.71569295343061e-08,651835,-,114784,CSMD2,CUB and Sushi multiple domains 2
1:34336095:34336473:clu_30740_NA,1,34336096,34336473,+,1_34349815_C_A_b37,1,34349815,34349815,1.40127e-12,-0.863764,0.000999001,1.03633e-09,8.71569295343061e-08,16503,+,402779,CSMD2-AS1,CSMD2 antisense RNA 1
15:85200773:85201227:clu_16999_NA,15,85200774,85201227,+,15_85388653_A_G_b37,15,85388653,85388653,2.80062e-12,-0.867156,0.000999001,2.12775e-09,1.6942631547619e-07,3443,-,4828,NMB,neuromedin B
3:141724386:141747421:clu_68161_NA,3,141724387,141747421,+,3_141752480_G_C_b37,3,141752480,141752480,5.08441e-12,-1.30272,0.000999001,3.42692e-09,2.6387771055145e-07,205117,-,7029,TFDP2,transcription factor Dp-2
15:85198640:85199878:clu_16998_NA,15,85198641,85199878,+,15_85403496_G_A_b37,15,85403496,85403496,5.69043e-12,-0.871396,0.000999001,1.8331e-08,1.25049179576933e-06,3443,-,4828,NMB,neuromedin B

0 个答案:

没有答案