我正在尝试编写一个程序,它允许我将SQL文件相互比较,并通过将完整的SQL文件写入文本文件来开始。文本文件生成成功,但最后使用块,如下例所示:
SET ANSI_NULLS ONഀ
GOഀ
SET QUOTED_IDENTIFIER ONഀ
GOഀ
CREATE TABLE [dbo].[CDR](ഀ
下面是生成文本文件的代码
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
from _ast import Num
#imports packages
r= open('master_lines.txt', 'w')
directory= "E:\\" #file directory, anonymous omission
master= directory + "master"
databases= ["\\1", "\\2", "\\3", "\\4"]
file_types= ["\\StoredProcedure", "\\Table", "\\UserDefinedFunction", "\\View"]
servers= []
server_number= []
master_lines= []
for file in os.listdir("E:\\"): #adds server paths to an array
servers.append(file)
for num in range(0, len(servers)):
for file in os.listdir(directory + servers[num]): #adds all the servers and paths to an array
server_number.append(servers[num] + "\\" + file)
master= directory + server_number[server_number.index("master")]
master_var= master + databases[0]
tmp= master_var + file_types[1]
for file in os.listdir(tmp):
with open(file) as tmp_file:
line= tmp_file.readlines()
for num in range(0, len(line)):
r.write(line[num])
r.close
我已经尝试将编码更改为latin1和utf-8;当前的文本文件是最成功的,因为ascii和latin1分别生成了中文和阿拉伯字符。
以下是文本格式的SQL文件:
/****** Object: Table [dbo].[CDR] Script Date: 2017-01-12 02:30:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CDR](
[calldate] [datetime] NOT NULL,
[clid] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[src] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dst] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dcontext] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[channel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dstchannel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastapp] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastdata] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[duration] [int] NOT NULL,
[billsec] [int] NOT NULL,
[disposition] [varchar](45) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[amaflags] [int] NOT NULL,
[accountcode] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[userfield] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[uniqueid] [varchar](64) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[cdr_id] [int] NOT NULL,
[cost] [real] NOT NULL,
[cdr_tag] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[importID] [bigint] IDENTITY(-9223372036854775807,1) NOT NULL,
CONSTRAINT [PK_CDR_1] PRIMARY KEY CLUSTERED
(
[uniqueid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
) ON [ReadPartition]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [Idx_Dst_incl_uniqueId] Script Date: 2017-01-12 02:30:50 PM ******/
CREATE NONCLUSTERED INDEX [Idx_Dst_incl_uniqueId] ON [dbo].[CDR]
(
[dst] ASC
)
INCLUDE ( [uniqueid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
GO
十六进制转储以了解发生了什么,不是上述问题的一部分:
ff fe 2f 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00
20 00 4f 00 62 00 6a 00 65 00 63 00 74 00 3a 00
20 00 20 00 54 00 61 00 62 00 6c 00 65 00 20 00
5b 00 64 00 62 00 6f 00 5d 00 2e 00 5b 00 43 00
44 00 52 00 5d 00 20 00 20 00 20 00 20 00 53 00
63 00 72 00 69 00 70 00 74 00 20 00 44 00 61 00
74 00 65 00 3a 00 20 00 32 00 30 00 31 00 37 00
2d 00 30 00 31 00 2d 00 31 00 32 00 20 00 30 00
32 00 3a 00 33 00 30 00 3a 00 34 00 39 00 20 00
50 00 4d 00 20 00 2a 00 2a 00 2a 00 2a 00 2a 00
2a 00 2f 00 0d 00 0a 00 53 00 45 00 54 00 20 00
41 00 4e 00 53 00 49 00 5f 00 4e 00 55 00 4c 00
4c 00 53 00 20 00 4f 00 4e 00 0d 00 0a 00 47 00
4f 00 0d 00 0a 00 53 00 45 00 54 00 20 00 51 00
55 00 4f 00 54 00 45 00 44 00 5f 00 49 00 44 00
hexdump的结果:
../.*.*.*.*.*.*.
.O.b.j.e.c.t.:.
. .T.a.b.l.e. .
[.d.b.o.]...[.C.
D.R.]. . . . .S.
c.r.i.p.t. .D.a.
t.e.:. .2.0.1.7.
-.0.1.-.1.2. .0.
2.:.3.0.:.4.9. .
P.M. .*.*.*.*.*.
*./.....S.E.T. .
A.N.S.I._.N.U.L.
L.S. .O.N.....G.
O.....S.E.T. .Q.
U.O.T.E.D._.I.D.
答案 0 :(得分:1)
您的问题是原始文件采用UTF-16编码,并带有初始字节顺序标记。它通常在Windows上是透明的,因为几乎所有文件编辑器都会通过初始BOM自动读取它。
但是Python脚本的转换不是自动的!这意味着每个字符都被读作字符本身后跟一个null。它除了行尾之外几乎是透明的,因为空值只是再次写回以形成正常的UTF16字符。但是\n
不再以原始\r
开头,但是如果你在文本模式下编写了一个null,那么Python会用一对\r\n
替换它,它不再是有效的UTF16字符,这会导致集团显示。
修复这个问题很简单,只需在读取文件时声明UTF16编码:
for file in os.listdir(tmp):
with open(file, encoding='utf_16_le') as tmp_file:
或者,如果要保留UTF16编码,还可以使用它打开主文件。默认情况下,Python会将其编码为utf8。但我的建议是恢复到8位编码文件,以避免在以后想要处理输出文件时出现进一步的问题。