我对正则表达式或编程不满意。
我在文本文件中有这样的数据:
RAMCHAR@HOTMAIL.COM ():
PATTY.FITZGERALD327@GMAIL.COM ():
OHSCOACHK13@AOL.COM (19OB3IRCFHHYO): [{"num":1,"name":"Bessey VAS23 Vario Angle Strap Clamp","link":"http:\/\/www.amazon.com\/dp\/B0000224B3\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I1YMLERDXCK3UU&psc=1","old-price":"N\/A","new-price":"","date-added":"October 19, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51VMDDHT20L._SL500_SL135_.jpg","page":1},{"num":2,"name":"Designers Edge L-5200 500-Watt Double Bulb Halogen 160 Degree Wide Angle Surround Portable Worklight, Red","link":"http:\/\/www.amazon.com\/dp\/B0006OG8MY\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I1BZH206RPRW8B","old-price":"N\/A","new-price":"","date-added":"October 8, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/5119Z4RDFYL._SL500_SL135_.jpg","page":1},{"num":3,"name":"50 Pack - 12"x12" (5) Bullseye Splatterburst Target - Instantly See Your Shots Burst Bright Florescent Yellow Upon Impact!","link":"http:\/\/www.amazon.com\/dp\/B00C88T12K\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I31RJXFVF14TBM","old-price":"N\/A","new-price":"","date-added":"October 8, 2014","priority":"","rating":"N\/A","total-ratings":"67","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51QwsvI43IL._SL500_SL135_.jpg","page":1},{"num":4,"name":"DEWALT DW618PK 12-AMP 2-1\/4 HP Plunge and Fixed-Base Variable-Speed Router Kit","link":"http:\/\/www.amazon.com\/dp\/B00006JKXE\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I39QDQSBY00R56&psc=1","old-price":"N\/A","new-price":"","date-added":"September 3, 2012","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/416a5nzkYTL._SL500_SL135_.jpg","page":1}]
是否有人建议将这些数据分成两列的简单方法(第一列中的电子邮件ID和第二列中的json格式数据)。某些行可能只有电子邮件ID(如第1行),没有相应的json数据。
请帮忙。谢谢!
答案 0 :(得分:0)
请尝试以下解决方案(适用于Python 2)。这假设每个条目都在一行上(这意味着JSON子字符串中可能没有换行符)。我已选择in.txt
作为您的数据文件的文件名 - 将其更改为实际的文件名/路径:
import csv
import re
regex = re.compile("""
([^:]*) # Match and capture any characters except colons
:[ ]* # Match a colon, followed by optional spaces
(.*) # Match and capture the rest of the line""",
re.VERBOSE)
with open("in.txt") as infile, open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
for line in infile:
writer.writerow(regex.match(line).groups())
答案 1 :(得分:0)
如果您在Linux / Unix环境中,可以使用sed
,因此a.txt
是您的输入文件:
<a.txt sed 's/\(^[^ (]*\)[^:]*: */\1 /'
正则表达式^[^ (]*
匹配每行的开头(^
)和零个不是空格或左括号([^ (]*
)的字符,并将其放在{{{} 1}}和\(
你做了sed&#34;记得&#34;匹配的字符串为\)
。然后\1
表达式匹配任何字符,包括冒号和之后的零或多个空格。然后,所有这些匹配的表达式在每行中都被记住的[^:]*: *
字符串替换,这实际上是电子邮件。该行的其余部分是JSON数据,它们保持不变。
如果您想要CSV或制表符分隔文件,则需要替换/1
之后的空格字符,例如
\1