我确信有人会知道如何解决这个问题。我有一个宠物项目,我试图从下面的几个txt建立一个数据库。记录按第6行分隔,为空白。字段由连续的空格分隔,五行的末尾记录。某些字段确实有空格。
我已经尝试过DataStage和SPSS - 但似乎无法安静得到结果。我也试过Altova Mapforce,这让我很亲近。我选择的数据库可能是MySQL(考虑到这是我业余时间的项目)
SUNCOR ET AL MEDHAT 9-17-15-4 0416613 ALBERTA CROWN 753.3M
100/09-17-015-04W4/00 S 543.4M W 167.6M MEDICINE HAT 656.8M
DEV (NC) MEDICINE HAT FISH SCALE ZONE
VERTICAL NEW PRODUCTION GAS
SUNCOR ENERGY INC. 09-17-015-04W4
CVE HOUSE 3-23-83-17 0416614 ALBERTA CROWN 536.17M
1AB/03-23-083-17W4/00 N 281.3M E 686.8M BONNYVILLE 283.7M
OV (C) HOUSE MCMURRAY FM
VERTICAL NEW OIL SAND EVALUATION CRUDE BITUMEN
CENOVUS ENERGY INC. 03-23-083-17W4
CVE GRANOR 11-27-82-18 0416615 ALBERTA CROWN 554.69M
1AA/11-27-082-18W4/00 S 756.7M E 677.6M BONNYVILLE 409.2M
OV (C) GRANOR GROSMONT FM
VERTICAL NEW OIL SAND EVALUATION CRUDE BITUMEN
CENOVUS ENERGY INC. 11-27-082-18W4
SUNCOR ET AL MEDHAT 4-17-15-4 0416616 ALBERTA CROWN 750.9M
100/04-17-015-04W4/00 N 320.1M E 317.1M MEDICINE HAT 646.4M
DEV (NC) MEDICINE HAT FISH SCALE ZONE
VERTICAL NEW PRODUCTION GAS
SUNCOR ENERGY INC.
04-17-015-04W4
某些字段包含单个空格但从不包含多个连续空格。
答案 0 :(得分:1)
好的,我迎接挑战。我不确定你想要输出什么,但我想CSV可以导入任何数据库。这就是我所拥有的:
sed -E 's/[ ][ ]+/,/g' yourfile | awk 'BEGIN{ORS=""}/^$/{print "\n"}{print $0}'
“sed”部分将多个空格转换为逗号以分隔字段,并且希望不会将单个空格转换为逗号。然后“awk”部分将输出记录分隔符设置为空,因此“awk”不输出任何换行符,我可以控制它们。 “^ $”查找空行,当我遇到它时,我自己在输出中引入换行符,否则它只打印输入行。使用您提供的数据,输出如下所示:
SUNCOR ET AL MEDHAT 9-17-15-4,0416613,ALBERTA CROWN,753.3M,100/09-17-015-04W4/00,S,543.4M,W,167.6M,MEDICINE HAT,656.8M,DEV (NC),MEDICINE HAT,FISH SCALE ZONE,VERTICAL,NEW,PRODUCTION,GAS,SUNCOR ENERGY INC.,09-17-015-04W4,
CVE HOUSE 3-23-83-17,0416614,ALBERTA CROWN,536.17M,1AB/03-23-083-17W4/00,N,281.3M,E,686.8M,BONNYVILLE,283.7M,OV (C),HOUSE,MCMURRAY FM,VERTICAL,NEW,OIL SAND EVALUATION,CRUDE BITUMEN,CENOVUS ENERGY INC.,03-23-083-17W4,
CVE GRANOR 11-27-82-18,0416615,ALBERTA CROWN,554.69M,1AA/11-27-082-18W4/00,S,756.7M,E,677.6M,BONNYVILLE,409.2M,OV (C),GRANOR,GROSMONT FM,VERTICAL,NEW,OIL SAND EVALUATION,CRUDE BITUMEN,CENOVUS ENERGY INC.,11-27-082-18W4,
SUNCOR ET AL MEDHAT 4-17-15-4,0416616,ALBERTA CROWN,750.9M,100/04-17-015-04W4/00,N,320.1M,E,317.1M,MEDICINE HAT,646.4M,DEV (NC),MEDICINE HAT,FISH SCALE ZONE,VERTICAL,NEW,PRODUCTION,GAS,SUNCOR ENERGY INC.
我猜行端的尾随逗号可以用另一个
清理sed -e "s/,$//"
如有必要,到原始管道的末尾。
我会在那里停下来,因为我不知道自己是否走在正确的轨道上!
答案 1 :(得分:1)
当您使用Windows时,我已经将awk和sed内容重新编写为可以在Windows上运行而无需它们的内容。在我的生活中,我从未写过一行VBScript直到今天,所以可能还有其他更简单的方法:
'###############################################################################
' File: process.vbs
' Author: Mark Setchell
'
' VBScript to process companies file.
'
' Use as follows:
' cscript /nologo process.vbs < file
'
' Or, to save to an output file, use as follows:
' cscript /nologo process.vbs < file > results.txt
'###############################################################################
Dim rxp, inp
Set rxp = new RegExp
rxp.Global = True
rxp.Multiline = False
Do While Not WScript.StdIn.AtEndOfStream
inp = WScript.StdIn.ReadLine()
' Regular expression to match any upper case letter
rxp.Pattern="[A-Z]"
' If there are any letters on the input line
if rxp.Test(inp) Then
' Replace multiple spaces with a single comma
rxp.Pattern=" +"
inp = rxp.Replace(inp, ",")
' Remove leading and trailing commas off line
rxp.Pattern="^,|,$"
WScript.StdOut.Write rxp.Replace(inp, "")
Else
' Write a blank line since there was nothing on input line
WScript.StdOut.WriteBlankLines(1)
End If
Loop