我正在尝试解析CSV,如果在任一列中满足条件,请将其写入新的csv。
例如
如果我的csv看起来像
123 Some Street
Flat 1, 21 Other road
House, Someother street
我需要分析每一行,所以如果一个数字出现在第一列而不是第二列,那么我需要提取该数字,如果两列中都有数字,那么我需要提取两个,如果没有数字然后我需要在第一列中提取文本。 然后用2个原始列和3个新的数字1,数字2,文本写一个新的csv。即平号,门牌号码,门牌号码。 所以新的CSV看起来像
123 Some Street, , 123,
Flat 1, 21 Other road, 1, 21,
House, Someother street, , , House.
任何指导都会非常有用。
由于
被修改
import csv
csvFile = 'myData.csv'
csvOut = 'myOut.csv'
reader = csv.reader(csvFile)
writer = csv.writer(csvOut)
for row in reader:
num = \d | \d\d | \d\d\d
if row [0] || row [1] == num
if row [1] == num
writer.row [3]
else row [0] == num
writer.row [2]
writer.row [3]
else writer.row [0] [2]
csvOut.close()
再次编辑
我希望这可能是一个更清晰的探索:
我希望输出为新的CSV,原始数据在行[0],[1]中然后如果行中只有一个数字,即写入行[3]的门牌号,如果一行中有2个数字(行[0]和行[1]),那么它们应分别写入行[2]和[3],如果没有数字,则写入行[0]的字符串排[4]。最后,我需要将公寓号码,门牌号码和房屋名称分成3个不同的栏目。
进一步编辑
我一直在研究代码,现在有了以下内容,我觉得我越来越近但仍有一段距离?
import csv
import re
csvFile = open(myData.csv, 'rb')
csvOut = open(myOut.csv, 'wb')
reader = csv.reader(csvFile)
writer = csv.writer(csvOut)
for row in reader:
a = row [0] re.compile('\d' | '\d\d' | '\d\d\d')
a1 = row [0] re.compile('\d' | '\d\d' | '\d\d\d')
b = row [1]
b1 = row [1] re.compile('\d' | '\d\d' | '\d\d\d')
if b = re.compile('\d' | '\d\d' | '\d\d\d')
writer.writerow(a,b,a1,b1, )
elif a = re.compile('\d' | '\d\d' | '\d\d\d')
witer.writerow(a,b, , b1, )
else
writer.writerow(a,b, , ,a)
csvOut.close()
由于
答案 0 :(得分:0)
这可能会给我一个线索,因为我不完全确定你需要什么。
$cat t1
123 Some Street
Flat 1, 21 Other road
House, 23 Someother street
实施例
import csv
import re
p = re.compile('\d+')
for row in csv.reader(open('t1')):
print "ROW", row
match = p.search(row[0])
if match:
print "\t#1", match.group()
if len(row) > 1:
match = p.search(row[1])
if match:
print "\t#2", match.group()
输出
ROW ['123 Some Street']
#1 123
ROW ['Flat 1', ' 21 Other road']
#1 1
#2 21
ROW ['House', ' 23 Someother street']
#2 23
答案 1 :(得分:0)
以下代码可能会执行您需要的所有操作。对于输出,只需索引元组并写出所需的组件。每个结果都有6个元素
#(flat str, flat #, street str, street #, street, street type)
a = """
123 Some Street
Flat 1, 21 Other road
House, Someother street
"""
import re
#flat gets a word, 0 or more spaces, 0 or more digits
flat = "([a-z]+ *(\d+)*)"
#street gets 0 or more digits, 1 or more spaces, 1 or more words with a space consuming until it hits street, or road or drive
street = "((\d+)* +([a-z]+ )+?(street|road|drive))"
address = "%s*.*?%s" % (flat,street)
m = re.compile(r"%s" % address, re.I)
results = m.findall(a)
with('output.csv','w') as fout:
#whatever you wish to name your columns
fout.write("Building,Address,Suite Number, Building Number")
for r in results:
fout.write("%s,%s,%s,%s" % (r[0],r[2],r[1],r[3]))
结果
[('', '', '123 Some Street', '123', 'Some ', 'Street'),
('Flat 1', '1', '21 Other road', '21', 'Other ', 'road'),
('House', '', ' Someother street', '', 'Someother ', 'street')]