在条件中取自其他文件的列中的replacin数据..,python

时间:2017-05-10 14:07:05

标签: python text conditional-statements

在我的大型数据集中,我有一列名称如下:

主文件:

1, NAME1
2, NAME2
3, NAME2
   ...

我需要的是,用一些条件创建带有姓氏的第三列。 我有2个单字文本文件(SURNAME1.txt,SURNAME2.txt)。我需要创建条件,我可以使用它来创建第3列,例如:

if NAME1 in 'SURNAME1.txt':
then create field in 3rd main file where will be written 'SURNAME1'

现在我可以使用此代码查看我的姓名:

if ('NAME1') in open('SURNAME1.txt').read():
print ("true")
输出

我需要在我的主文件中接收:

1, NAME1, SURNAME1
2, NAME2, SURNAME2
3, NAME2, SURNAME2

感谢您的建议

1 个答案:

答案 0 :(得分:0)

class Forenames(dict):
    def __missing__ (self, key):
        return ''

from collections import defaultdict

surnames = defaultdict(list)
for fileName in ['surname1.txt', 'surname2.txt']:
    surname = fileName[:-4]
    with open(fileName) as names:
        for line in names:
            surnames[surname].append(line.strip())

forenames = Forenames()
for surname in surnames:
    for forename in surnames[surname]:
        if forenames[forename]:
            raise RuntimeError('forename previously found')
        else:
            forenames[forename] = surname

with open('names.txt') as names:
    for line in names:
        number, value = line.strip().split(', ')
        surname = forenames[value]
        print (number, value, surname)

结果:

1 Bill 
2 Egon surname2
3 Cynthia surname1
4 Colin surname2
5 James surname2

...包含names.txt

的内容
1, Bill
2, Egon
3, Cynthia
4, Colin
5, James

...这适用于surname1.txt

John
Mary
Cynthia

...这适用于surname2.txt

Egon
Colin
James

首先,代码会生成一个与每个姓氏相对应的姓名字典。然后它将该字典反转为对应于姓氏的姓氏字典。构造后一个字典,以便用空字符串替换缺少的姓氏值。最后读取并解析文件names.txt,并在第二个字典中查找名称。