Question

我每天在一个文件夹中有20,000个文件，我想运行一些选择标准，即if 'EXTRA' not in file:和if file.endswith(prev_date + '.csv') and file.startswith('IC'):，其中prev_date = str('20190624')。 prev_date可能会发生变化。

我的问题是，某些列标题不同。全部都包含Index, Date。这些应该是前两列，然后通常在那之后我们看到`TICKER，UNITS。但这可能会改变，在下面的代码中您可以看到我编码的3个异常，但随后我发现还有另外20个不同的文件名-有些相同，但有些不同。

如何更改以下代码，使其遍历每一行，占据每一列的第一行，然后在匹配条件下匹配它并将其放入该列。或者，如果当前不在列列表中，则将其添加到末尾，然后对此进行索引？

import os

prev_date = str('20190624')

csv_header = 'Index,Date,Ticker,Units'
csv_header_alt = 'Index,Date,Ticker,Units,CCY,PRICE,NOTIONAL'
csv_header_alt_op = 'Index,Date,Ticker,CURRENT_WEIGHT'

csv_out = 'R:/Sam/simulator/consolidated_positions' + prev_date + '.csv'

csv_dir = """R:/Sam/simulator/"""

dir_tree = csv_dir
for dirpath, dirnames, filenames in os.walk(dir_tree):
    pass

csv_list = []
for dirpath, dirnames, filenames in os.walk(csv_dir):
    for file in filenames:
        if 'EXTRA' not in file:
            if file.endswith(prev_date + '.csv') and file.startswith('IC'):
                csv_list.append(os.path.join(dirpath, file))

csv_merge = open(csv_out, 'w')
csv_merge.write(csv_header)
csv_merge.write('\n')

for file in csv_list:
    csv_in = open(file)
    for line in csv_in:
        if line.startswith(csv_header) or line.startswith(csv_header_alt) or line.startswith(csv_header_alt_op):
            continue
        csv_merge.write(line)
    csv_in.close()
csv_merge.close()

print('\n Verify consolidated CSV file : ' + csv_out)

抱歉，我是Python的新手，但请以 ANY 的方式认为我可以提高代码的最优性。

合并文件的最佳方法，按列标题编制索引

0 个答案: