Question

INPUT1：

create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';

create external table db.emp1(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';

输入2：

create  table db.emp(id int,name string)
location 'hadfs:.../';;
create table db.emp1(id int,name string)
location 'hadfs:.../';

requeried output：

create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp(id int,name string)
location 'hadfs:.../';

这两个文件存储在file1.hql

下

create external table db.emp1(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create  table db.emp1(id int,name string)
location 'hadfs:.../';

这两个文件存储在file2.hql下，依此类推

filenames = ['in1.txt', 'in2.txt']
with open('result.txt', 'w') as outfile:
  for fname in filenames:
    with open(fname) as infile:
       content = infile.read().replace('\n', '')
        outfile.write(content)

但我没有得到正确的输出。请给我一个如何实现这一目标的提示。使用此代码时输出错误的mannaer

我尝试使用代码

import re
import sys
f = open ('text.txt','r')
fout = open ('hql.txt','w')
text = f.read()
fout = "hql.txt"
fout = open('hql.txt','w')
for item in re.findall(r'CREATE[^;]*;',text):
      print >>fout, re.search(r'(?<=\.)\w+',item).group()+'.hql'
      print >>fout,(item)

 f.close()
 fout.close()

此代码的o / p为：

emp.hql
create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';

emp1.hql

create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';

它会为input1和input2文件生成这种输出。现在我需要连接

create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create table db.emp(id int,name string)
location 'hadfs:.../';

并以emp.hql等形式存储在文件中。

Answer 1

这个缩进一切正常：

filenames = ['in1.txt', 'in2.txt']
with open('result.txt', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            content = infile.read().replace('\n', '')
            outfile.write(content)

并注意录制文件的顺序。您可以使用元组filenames = ('in1.txt', 'in2.txt')首先记录in1.txt，然后记录in2.txt。

Answer 2

此处的目的是从input1读取4行，从input2读取2行，删除所有空行，并将这些行按顺序写入输出文件。

为此，可以使用grouper()函数，它将以合适的块读取每个文件。 zip()用于同时将两组数据放在一起。

import itertools

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

with open('in1.txt') as f_1, open('in2.txt') as f_2:
    for file_number, rows in enumerate(zip(grouper(f_1, 4, ''), grouper(f_2, 2, '')), start=1):
        # Remove carriage returns and empty lines from all rows
        rows = [[row.strip() for row in r if len(row.strip())] for r in rows]

        with open('file{}.hql'.format(file_number), 'w') as f_output:
            f_output.write('{}\n{}\n'.format('\n'.join(rows[0]), '\n'.join(rows[1])))

给你file1.hql看起来如下：

create external table db.emp(id int,name string)
row formatted fields terminated by ','
location 'hadfs:.../';
create  table db.emp(id int,name string)
location 'hadfs:.../';;

我建议您添加print语句，以便更好地了解每行的内容。

如何使用python添加两个文件并保存到另一个文件中

2 个答案: