在我的大型数据集中,我有一列名称如下:
主文件:
1, NAME1
2, NAME2
3, NAME2
...
我需要的是,用一些条件创建带有姓氏的第三列。 我有2个单字文本文件(SURNAME1.txt,SURNAME2.txt)。我需要创建条件,我可以使用它来创建第3列,例如:
if NAME1 in 'SURNAME1.txt':
then create field in 3rd main file where will be written 'SURNAME1'
现在我可以使用此代码查看我的姓名:
if ('NAME1') in open('SURNAME1.txt').read():
print ("true")
输出我需要在我的主文件中接收:
1, NAME1, SURNAME1
2, NAME2, SURNAME2
3, NAME2, SURNAME2
感谢您的建议
答案 0 :(得分:0)
class Forenames(dict):
def __missing__ (self, key):
return ''
from collections import defaultdict
surnames = defaultdict(list)
for fileName in ['surname1.txt', 'surname2.txt']:
surname = fileName[:-4]
with open(fileName) as names:
for line in names:
surnames[surname].append(line.strip())
forenames = Forenames()
for surname in surnames:
for forename in surnames[surname]:
if forenames[forename]:
raise RuntimeError('forename previously found')
else:
forenames[forename] = surname
with open('names.txt') as names:
for line in names:
number, value = line.strip().split(', ')
surname = forenames[value]
print (number, value, surname)
结果:
1 Bill
2 Egon surname2
3 Cynthia surname1
4 Colin surname2
5 James surname2
...包含names.txt
1, Bill
2, Egon
3, Cynthia
4, Colin
5, James
...这适用于surname1.txt
John
Mary
Cynthia
...这适用于surname2.txt
Egon
Colin
James
首先,代码会生成一个与每个姓氏相对应的姓名字典。然后它将该字典反转为对应于姓氏的姓氏字典。构造后一个字典,以便用空字符串替换缺少的姓氏值。最后读取并解析文件names.txt
,并在第二个字典中查找名称。