我刚刚开始学习Python,我需要帮助我的实习要求我写一个脚本。
我有一个csv文件(sheet1.csv),我需要从只有两个列中提取数据,这两列具有彼此对应的头部referenceID和PartNumber。我需要更新一个名为sheet2.csv的单独csv文件,该文件还包含两列referenceID和PartNumber,但是许多PartNumber单元格都是空的。
基本上我需要用sheet1中的值填写“PartNumber”字段。从我做过的研究中我决定使用字典是编写这个脚本的一种可靠的方法(我认为)。到目前为止,我已经能够读取文件并创建两个字典,其中referenceIDs作为键,PartNumber作为值...这是我展示了字典的样子的例子。
import csv
a = open('sheet1.csv', 'rU')
b = open('sheet2.csv', 'rU')
csvReadera = csv.DictReader(a)
csvReaderb = csv.DictReader(b)
a_dict = {}
b_dict = {}
for line in csvReadera:
a_dict[line["ReferenceID"]] = line["PartNumber"]
print(a_dict)
for line in csvReaderb:
b_dict[line["ReferenceID"]] = line["PartNumber"]
print(b_dict)
a_dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}
b_dict = {'C774': '', 'R331': '', 'R454': '', 'L7896': 'PN000000', 'R0640': '', 'R150': 'PN000333'}
如何比较两个词典并填写/覆盖b-dict的缺失值然后写入sheet2?当然,必须有比我提出的更有效的方法,但我以前从未使用过Python,所以请原谅我可怜的尝试!
答案 0 :(得分:0)
看一下pandas库。
import padas as pd
#this is how you read
dfa = pd.read_csv("sheet1.csv")
dfb = pd.read_csv("sheet2.csv")
让我们把你定义的词组作为testdata
a_dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}
b_dict = {'C774': '', 'R331': '', 'R454': '', 'L7896': 'PN000000', 'R0640': '', 'R150': 'PN000333'}
dfar = pd.DataFrame(a_dict.items(), columns = ['ReferenceID', 'PartNumber'])
dfbr = pd.DataFrame(b_dict.items(), columns = ['ReferenceID', 'PartNumber'])
dfa = dfar[['ReferenceID', 'PartNumber']]
dfa.columns = ['ReferenceIDA', 'PartNumberA']
dfb = dfbr[['ReferenceID', 'PartNumber']]
dfb.columns = ['ReferenceIDB', 'PartNumberB']
你得到了这个
In [97]: dfa
Out[97]:
ReferenceIDA PartNumberA
0 R331 PN000873
1 R454 PN000333
2 L7896 PN000447
3 R150 PN000123
4 C774 PN000064
5 R0640 PN000878
In [98]: dfb
Out[98]:
ReferenceIDB PartNumberB
0 R331
1 R454
2 R0640
3 R150 PN000333
4 C774
5 L7896 PN000000
现在
In [67]: cd = pd.concat([dfa,dfb], axis=1)
In [68]: cd
Out[68]:
ReferenceIDA PartNumberA ReferenceIDB PartNumberB
0 R331 PN000873 R331
1 R454 PN000333 R454
2 L7896 PN000447 R0640
3 R150 PN000123 R150 PN000333
4 C774 PN000064 C774
5 R0640 PN000878 L7896 PN000000
cd["res"] = cd.apply(lambda x : x["PartNumberB"] if x["PartNumberB"] else x["PartNumberA"], axis=1)
cd
Out[106]:
ReferenceIDA PartNumberA ReferenceIDB PartNumberB res
0 R331 PN000873 R331 PN000873
1 R454 PN000333 R454 PN000333
2 L7896 PN000447 R0640 PN000447
3 R150 PN000123 R150 PN000333 PN000333
4 C774 PN000064 C774 PN000064
5 R0640 PN000878 L7896 PN000000 PN000000
这就是你想要的
只需设置
dfbr['PartNumber'] = cd['res']
并转储到csv
dfbr.to_csv('sheet2.csv')