我想在文本文件中删除带有字符串或空行的行。看起来像这样。正如您所看到的那样,标题会在文件中重复出现。具有数据的行数与每个块不同。我需要它作为一个数组导入numpy。起初我用逗号表示小数点,至少我能够改变它。
我试过了,但它根本不起作用:
from types import StringType
z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
for x in z:
if type(z.readline(x)) is StringType:
print line
z.close()
数据示例:
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
答案 0 :(得分:4)
Python会首先将所有文件元素作为字符串读取,除非你强制转换它们,所以你的方法不起作用。
您最好的选择可能是使用正则表达式来过滤掉包含非数据字符的行。
f = open("datafile")
for line in f:
#Catch everything that has a non-number/space in it
if re.search("[^-0-9.\s]",line):
continue
# Catch empty lines
if len(line.strip()) == 0:
continue
# Keep the rest
print(line)
f.close()
答案 1 :(得分:0)
为什么不使用numpy.loadtxt
?对于这些情况,它有一个非常好的界面。
请参阅documentation here
yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)
此外,由于你有heder(它应该是一个可以在文件顶部找到的标题),你可以将文件分成多个文件。您可以使用Python执行此操作,也可以使用UNIX命令csplit
。怎么做,你会得到什么:
oz123@:~/tmp> csplit -k data.txt '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00 xx01 xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462
oz123@:~/tmp> cat xx02
bla bla
cyclical stuff Time: 81.095947 Sec 2012-08-02 17:05:42
stored : 1 cycle stores for : 62 seg-cycle
Points : 4223
Servo_Hyd count Temps Servo_Air pressure Servo_Hyd load Servo_Hyd LVDT1 Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1 name1 name1 name1 name1 name1 name1
1 60.102783 0.020013755 89.109558 0.3552089 0.4015148 -0.33822596
1 60.107666 0.020006953 89.025749 0.35519764 0.4015218 -0.33821729
1 60.112549 0.02000189 88.886292 0.3551946 0.4015184 -0.33822691
1 60.117432 0.020007374 89.559196 0.35519707 0.40151948 -0.33823174
1 60.122314 0.019991774 89.741402 0.35519552 0.40151322 -0.33822927
1 60.127197 0.020003742 89.748924 0.35520011 0.40150556 -0.33822462