Question

我需要将 .txt 文件解析为 .csv 文件。要解析的数据一遍又一遍地看起来像以下三行，直到文件结束。

oklahoma-07  (rt66)
1 12345k 9876542, 4234234.5345345 -.000001234 0000.0 14135.4 0 9992
2 12345 101.8464 192.3456 00116622 202.9136 512.3361 12.543645782334

texas-15 (hwy35)
1 12345k 9876542, 4234234.5345345 -.000001234 0000.0 14135.4 0 9992
2 12345 101.8464 192.3456 00116622 202.9136 512.3361 12.543645782334

上面的分隔符是空格。

此外，源文件将来自网站我将信息保存在网站上，此时屏幕上显示的是.txt文件。例如。看起来像＆＃34; http://www.example.com/listing.txt＆＃34;。

可能只有前3行或90或144行数据，但数据总是在三行中，然后是下一个数据集。它只需要将文件解析到文件末尾。

总有两个关键字符：

＆＃34; 1＆＃34;在第二行，和＆＃34; 2＆＃34;在数据集的第三行

输出需要解析如下：

oklahoma-07,(rt66), 1, 12345k, 9876542, 4234234.5345345, -.000001234, 0000.0, 14135.4, 0, 9992, 2, 12345, 101.8464, 192.3456, 00116622, 202.9136, 512.3361, 12.543645782334  

texas-15, (hwy35), 1, 12345k, 9876542, 4234234.5345345, -.000001234, 0000.0, 14135.4, 0, 9992, 2, 12345, 101.8464, 192.3456, 00116622, 202.9136, 512.3361, 12.543645782334

所以我可以在Excel中查看分隔符字符应该是逗号。为简单起见，我对每个数据集使用相同的数字。

最后，我需要将文件保存到特定位置的 filename.csv 文件，例如C:/documents/stuff/。

我是Python新手。我看过很多不同的代码示例，让我很困惑。

Answer 1

如果你确定数据总是采用这种格式，那么简单的方法就是：

comma_sep = []
this_line = []

lines = my_file.readlines()

for i in range(len(lines)):
    if i % 3 = 0:
        comma_sep.append(" ".join(this_line))
        this_line = []
    else:
        this_line.append(lines[i])

for line in comma_sep:
    line.replace(' ',',')

我确信有一种更清洁的方法。

另外，我建议阅读Python文档以获取有关如何使用urllib和文件处理等基本信息。

Answer 2

这是一种方法，包括如何下载txt文件和编写csv文件。块生成器代码来自this answer.

import urllib2

inputfile = urllib2.urlopen('http://127.0.0.1:8000/data.txt')
lines = inputfile.readlines()

def chunks(l, n):
  """Yield successive n-sized chunks from l."""
  for i in xrange(0, len(l), n):
    yield l[i:i+n]

out = []
for record in chunks(lines, 4):
  s = ' '.join(record).replace(',','') # Create a string from the record (list) and remove the comma
  out.append(','.join(s.split())) # Creates a comma separated string and removes whitespace

with open('data.csv', 'w') as outfile:
  for record in out:
    outfile.write("%s\n" % record)

将文本文件从网站解析为.csv文件

2 个答案: