Question

我的test1111.csv看起来与此相似：

Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214

我有以下代码：

import re
import csv
from collections import defaultdict

# Below code places csv entries into dictionary so that they can be parsed
# by column.  Then print statement prints Sales # column.
columns = defaultdict(list)
with open("c:\\test1111.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        for (k,v) in row.items():
            columns[k].append(v)

# To print all columns, use: print columns
# To print a specific column, use:  print(columns['ST'])
# Below line takes list output and separates into new lines
sales1 = "\n".join(columns['Sales #'])
print sales1

# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
     open("c:\\results1111.csv", 'wb') as new:
    for line in old:
    #Regex to match exactly 10 digits
        match = re.search('(?<!\d)\d{10}(?!\d)', line)
        if match:
            match1 = match.group()
            print match1
            new.writelines((match1) + '\n')
        else:
            nomatch = "No match"
            print nomatch
            new.writelines((nomatch) + '\n')

代码的第一部分打开原始csv，并将Sales＃列中的所有条目打印到stdout，每个条目都在自己的行中。

代码的第二部分打开原始csv并在每行搜索一个10位数字。当它找到一个时，它会将每个（或不匹配）写入新csv的每一行。

我现在要做的是将销售列数据写入新的csv。因此，最终，销售列数据将显示为第一列中的行，而正则表达式数据将显示为新csv中第二列中的行。由于new.writelines不会采用两个参数，因此我无法使其工作。有人可以帮我解决一下这个问题吗？

我希望results1111.csv看起来像这样：

393ED3, 5555551212
585E54, 5555551213
585868, 5555551214

Answer 1

从代码的第二部分开始，您需要做的就是在writelines中连接销售数据：

sales_list = sales1.split('\n')

# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.

with open("c:\\test1111.csv", "r") as old, \
     open("c:\\results1111.csv", 'wb') as new:
    i = 0 # counter to add the proper sales figure
    for line in old:
        #Regex to match exactly 10 digits
        match = re.search('(?<!\d)\d{10}(?!\d)', line)
        if match:
            match1 = match.group()
            print match1
            new.writelines(str(sales_list[i])+ ',' + (match1) + '\n')
        else:
            nomatch = "No match"
            print nomatch
            new.writelines(str(sales_list[i])+ ',' + (nomatch) + '\n')
        i += 1

使用计数器i，您可以跟踪您所在的行并使用该行添加相应的销售列数。

Answer 2

只是要指出在CSV中，除非确实需要空格，否则它们不应该存在。您的数据应如下所示：

Sales #,Date,Tel Number,Comment
393ED3,5/12/2010,5555551212,left message
585E54,6/15/2014,5555551213,voice mail
585868,8/16/2010,,number is 5555551214

并且，添加一种获得相同答案的新方法，您可以将Pandas数据分析库用于涉及数据表的任务。对于您想要实现的目标，它只会是2行：

>>> import pandas as pd
# Read data
>>> data = pd.DataFrame.from_csv('/tmp/in.cvs')
>>> data
             Date  Tel Number               Comment
Sales#                                             
393ED3  5/12/2010  5555551212          left message
585E54  6/15/2014  5555551213            voice mail
585868  8/16/2010         NaN  number is 5555551214

# Write data
>>> data.to_csv('/tmp/out.cvs', columns=['Tel Number'], na_rep='No match')

当没有找到电话号码时，最后一行会写入out.cvs插入Tel Number的列No match，这正是您想要的。输出文件：

Sales#,Tel Number
393ED3,5555551212.0
585E54,5555551213.0
585868,No match

将数据插入两列csv

2 个答案: