Question

我有一个包含表格数据的文本文件。我需要做的是自动执行写入逗号分隔而不是空格分隔的新文本文件的任务，从现有数据中提取几列，重新排序列。

这是原始数据的前4行的片段：

Number of rows: 8542
 Algorithm  |Date   |Time   |Longitude  |Latitude   |Country    
 1  2000-01-03  215926.688  -0.262  35.813  Algeria 
 1  2000-01-03  215926.828  -0.284  35.817  Algeria

这是我最终想要的：

Longitude,Latitude,Country,Date,Time
-0.262,35.813,Algeria,2000-01-03,215926.688

有关如何处理此事的任何提示？

Answer 1

我猜文件是由制表符分隔的，而不是空格。

如果是这样，您可以尝试以下方式：

input_file = open('some_tab_separated_file.txt', 'r')
output_file = open('some_tab_separated_file.csv', 'w')
input_file.readline() # skip first line 
for line in input_file:
    (a, date, time, lon, lat, country) = line.strip().split('\t')
    output_file.write(','.join([lon, lat, country, date, time]) + '\n')
input_file.close()
output_file.close()

此代码未经测试，任何错误都会留给您作为练习。

Answer 2

您可以使用csv模块和带有' '分隔符的阅读器来读取您的数据，并使用同一模块中的编写器（使用逗号分隔符）来生成输出。

事实上，the first example in the csv module documentation使用delimiter=' '。

您可以使用DictReader / DictWriter并在其构造函数中指定列的顺序（fieldnames列表：如果您要重新排序，则为读者/编写者不同）按照您希望的顺序输出条目。

（生成输出时，您可能需要跳过/忽略前两行。）

修改

以下是处理多字国家/地区名称的示例：

import cStringIO import csv f = cStringIO.StringIO("""A B C 1 2 Costa Rica 3 4 Democratic Republic of the Congo """) r = csv.DictReader(f, delimiter=' ', restkey='rest') for row in r: if row.get('rest'): row['C'] += " %s" % (" ".join(row['rest'])) print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C'])

使用restkey=并连接该值的dict条目，该值是剩下的内容列表（此处为restkey='rest'）。这打印：

A: 1, B: 2, C: Costa Rica A: 3, B: 4, C: Democratic Republic of the Congo

Answer 3

没有任何参数的

str.split()将被任何长度的空格分开。 operator.itemgetter()接受多个参数，并返回一个元组。

Answer 4

我想重要的是你必须使用'\ t'作为分隔符@Paulo Scardine。

我只想补充一点，pandas是一个非常好的库来处理列数据。

>>> src = 'path/to/file'
>>> dest = 'path/to/dest_csv'
>>> column_names = ['names', 'of', 'columns']

>>> df = pd.read_csv(src, delimiter='\t', names=column_names)

# Do something in pandas if you need to

>>> df.to_csv(dest, index=False, sep = ';')

将空格分隔文件转换为CSV

4 个答案: