我正在尝试解析测试文件。该文件具有以下格式的用户名,地址和电话:
Name: John Doe1
address : somewhere
phone: 123-123-1234
Name: John Doe2
address : somewhere
phone: 123-123-1233
Name: John Doe3
address : somewhere
phone: 123-123-1232
仅适用于近1万用户:)我想要做的是将这些行转换为列,例如:
Name: John Doe1 address : somewhere phone: 123-123-1234
Name: John Doe2 address : somewhere phone: 123-123-1233
Name: John Doe3 address : somewhere phone: 123-123-1232
我更愿意在bash
中执行此操作,但如果您知道如何在python中执行此操作,那么具有此信息的文件位于/ root / docs / information中。任何提示或帮助将不胜感激。
答案 0 :(得分:5)
GNU awk
的一种方式:
awk 'BEGIN { FS="\n"; RS=""; OFS="\t\t" } { print $1, $2, $3 }' file.txt
结果:
Name: John Doe1 address : somewhere phone: 123-123-1234
Name: John Doe2 address : somewhere phone: 123-123-1233
Name: John Doe3 address : somewhere phone: 123-123-1232
请注意,我已将输出文件分隔符(OFS
)设置为两个制表符(\t\t
)。您可以将其更改为您喜欢的任何字符或字符集。 HTH。
答案 1 :(得分:3)
短Perl
一行:
$ perl -ne 'END{print "\n"}chomp; /^$/ ? print "\n" : print "$_\t\t"' file.txt
<强>输出强>
Name: John Doe1 address : somewhere phone: 123-123-1234
Name: John Doe2 address : somewhere phone: 123-123-1233
Name: John Doe3 address : somewhere phone: 123-123-1232
答案 2 :(得分:2)
使用粘贴,我们可以加入文件中的行:
$ paste -s -d"\t\t\t\n" file
Name: John Doe1 address : somewhere phone: 123-123-1234
Name: John Doe2 address : somewhere phone: 123-123-1233
Name: John Doe3 address : somewhere phone: 123-123-1232
答案 3 :(得分:1)
这似乎基本上符合你的要求:
information = 'information' # file path
with open(information, 'rt') as input:
data = input.read()
data = data.split('\n\n')
for group in data:
print group.replace('\n', ' ')
输出:
Name: John Doe1 address : somewhere phone: 123-123-1234
Name: John Doe2 address : somewhere phone: 123-123-1233
Name: John Doe3 address : somewhere phone: 123-123-1232
答案 4 :(得分:1)
我知道你没有提到awk,但它很好地解决了你的问题:
awk 'BEGIN {RS="";FS="\n"} {print $1,$2,$3}' data.txt
答案 5 :(得分:1)
这里的大多数解决方案只是重新格式化您正在阅读的文件中的数据。也许这就是你想要的一切。
如果您确实想要解析数据,请将其放在数据结构中。
Python中的这个例子:
data="""\
Name: John Doe2
address : 123 Main St, Los Angeles, CA 95002
phone: 213-123-1234
Name: John Doe1
address : 145 Pearl St, La Jolla, CA 92013
phone: 858-123-1233
Name: Billy Bob Doe3
address : 454 Heartland St, Mobile, AL 00103
phone: 205-123-1232""".split('\n\n') # just a fill-in for your file
# you would use `with open(file) as data:`
addr={}
w0,w1,w2=0,0,0 # these keep track of the max width of the field
for line in data:
fields=[e.split(':')[1].strip() for e in [f for f in line.split('\n')]]
nam=fields[0].split()
name=nam[-1]+', '+' '.join(nam[0:-1])
addr[(name,fields[2])]=fields
w0,w1,w2=[max(t) for t in zip(map(len,fields),(w0,w1,w2))]
现在您可以自由地排序,更改格式,放入数据库等等。
这将使用该数据打印您的格式,已排序:
for add in sorted(addr.keys()):
print 'Name: {0:{w0}} Address: {1:{w1}} phone: {2:{w2}}'.format(*addr[add],w0=w0,w1=w1,w2=w2)
打印:
Name: John Doe1 Address: 145 Pearl St, La Jolla, CA 92013 phone: 858-123-1233
Name: John Doe2 Address: 123 Main St, Los Angeles, CA 95002 phone: 213-123-1234
Name: Billy Bob Doe3 Address: 454 Heartland St, Mobile, AL 00103 phone: 205-123-1232
按照dict键中使用的姓氏,名字排序。
现在打印按区号排序:
for add in sorted(addr.keys(),key=lambda x: addr[x][2] ):
print 'Name: {0:{w0}} Address: {1:{w1}} phone: {2:{w2}}'.format(*addr[add],w0=w0,w1=w1,w2=w2)
打印:
Name: Billy Bob Doe3 Address: 454 Heartland St, Mobile, AL 00103 phone: 205-123-1232
Name: John Doe2 Address: 123 Main St, Los Angeles, CA 95002 phone: 213-123-1234
Name: John Doe1 Address: 145 Pearl St, La Jolla, CA 92013 phone: 858-123-1233
但是,由于您在索引字典中有数据,因此可以将其打印为表格,而不是按邮政编码排序:
# print table header
print '|{0:^{w0}}|{1:^{w1}}|{2:^{w2}}|'.format('Name','Address','Phone',w0=w0+2,w1=w1+2,w2=w2+2)
print '|{0:^{w0}}|{1:^{w1}}|{2:^{w2}}|'.format('----','-------','-----',w0=w0+2,w1=w1+2,w2=w2+2)
# print data sorted by last field of the address - probably a zip code
for add in sorted(addr.keys(),key=lambda x: addr[x][1].split()[-1]):
print '|{0:>{w0}}|{1:>{w1}}|{2:>{w2}}|'.format(*addr[add],w0=w0+2,w1=w1+2,w2=w2+2)
打印:
| Name | Address | Phone |
| ---- | ------- | ----- |
| Billy Bob Doe3| 454 Heartland St, Mobile, AL 00103| 205-123-1232|
| John Doe1| 145 Pearl St, La Jolla, CA 92013| 858-123-1233|
| John Doe2| 123 Main St, Los Angeles, CA 95002| 213-123-1234|
答案 6 :(得分:0)
在Python中:
results = []
cur_item = None
with open('/root/docs/information') as f:
for line in f.readlines():
key, value = line.split(':', 1)
key = key.strip()
value = value.strip()
if key == "Name":
cur_item = {}
results.append(cur_item)
cur_item[key] = value
for item in results:
# print item
答案 7 :(得分:0)
您应该能够使用字符串上的split()
方法解析此问题:
line = "Name: John Doe1"
key, value = line.split(":")
print(key) # Name
print(value) # John Doe1
答案 8 :(得分:0)
您可以迭代线条并将其打印在这样的列中 -
for line in open("/path/to/data"):
if len(line) != 1:
# remove \n from line's end and make print statement
# skip the \n it adds in the end to continue in our column
print "%s\t\t" % line.strip(),
else:
# re-use the blank lines to end our column
print
答案 9 :(得分:0)
#!/usr/bin/env python
def parse(inputfile, outputfile):
dictInfo = {'Name':None, 'address':None, 'phone':None}
for line in inputfile:
if line.startswith('Name'):
dictInfo['Name'] = line.split(':')[1].strip()
elif line.startswith('address'):
dictInfo['address'] = line.split(':')[1].strip()
elif line.startswith('phone'):
dictInfo['phone'] = line.split(':')[1].strip()
s = 'Name: '+dictInfo['Name']+'\t'+'address: '+dictInfo['address'] \
+'\t'+'phone: '+dictInfo['phone']+'\n'
outputfile.write(s)
if __name__ == '__main__':
with open('output.txt', 'w') as outputfile:
with open('infomation.txt') as inputfile:
parse(inputfile, outputfile)
答案 10 :(得分:0)
使用sed
的解决方案。
cat input.txt | sed '/^$/d' | sed 'N; s:\n:\t\t:; N; s:\n:\t\t:'
sed '/^$/d'
删除空白行。 sed 'N; s:\n:\t\t:; N; s:\n:\t\t:'
组合了这些线。Name: John Doe1 address : somewhere phone: 123-123-1234 Name: John Doe2 address : somewhere phone: 123-123-1233 Name: John Doe3 address : somewhere phone: 123-123-1232