Question

我的脚本如下：

counter = 0
with open(output_file, 'a') as f_out:
    with codecs.open(data_file, 'r', encoding='utf8') as f:
        for line in f:
            counter += 1
            try:
                created_at = datetime.strptime(line[:first_colon], '%Y-%m-%d').strftime('%Y-%m-%d')
            except ValueError:
                log('Parse Error at line ' + str(counter))
                continue
            f_out.write(str(counter)+','+line+'\n')

当我使用

检查输出文件和数据文件中的相应行时

sed -n '#counterhere#p' data_file

，我发现该行不匹配。

关于这里发生了什么的任何想法？

编辑：

例如，在数据文件中我们有：

2016-03-18,Content1
2016-03-#J,Content2
2016-03-20,Content3

因此在输出文件中我们有：

1,2016-03-18,Content1
3,2016-03-20,Content3

这样我就可以使用以下内容找到数据文件中的确切行：

sed -n '3p' data_file

它应该返回＆＃34; Content3＆＃34;当它没有。

在小文件中一切都很顺利。然而，因为我在大文件上运行它，我很难调试它。

Answer 1

以下是我一直在使用的工作示例：

import codecs
from datetime import datetime

output_file = 'out.csv'
data_file = 'data.csv'
first_colon = 9

counter = 0
with open(output_file, 'a') as f_out:
    with codecs.open(data_file, 'r', encoding='utf8') as f:
        for line in f:
            counter += 1
            try:
                created_at = datetime.strptime(line[:first_colon], '%Y-%m-%d').strftime('%Y-%m-%d')
            except ValueError:
                print('Parse Error at line ' + str(counter))
                continue
            f_out.write(str(counter)+','+line)

使用data.csv文件：

2016-03-18,Content1
2016-03-#J,Content2
2016-03-20,Content3

提供out.csv：

1,2016-03-18,Content1
3,2016-03-20,Content3

data.csv中具有正确源行的正确行号。因此，这些行号可用于查找源文件中的信息：

sed -n '3p' data.csv

给了我

2016-03-20,Content3

希望这有助于推动事态发展。

在python

1 个答案: