string(file1.txt)从file2.txt搜索

时间:2016-06-12 15:11:15

标签: python string search

file1.txt包含用户名,即

tony  
peter  
john  
...

file2.txt包含用户详细信息,每个用户详细信息只需一行,即

alice 20160102 1101 abc  
john 20120212 1110 zjc9  
mary 20140405 0100 few3  
peter 20140405 0001 io90  
tango 19090114 0011 n4-8  
tony 20150405 1001 ewdf  
zoe 20000211 0111 jn09  
...

我想通过file2.txt用户提供的file1.txt获取用户详细信息的简短列表,即

john 20120212 1110 zjc9  
peter 20140405 0001 io90  
tony 20150405 1001 ewdf  

如何使用python执行此操作?

3 个答案:

答案 0 :(得分:0)

import pandas as pd

df1 = pd.read_csv('df1.txt', header=None)
df2 = pd.read_csv('df2.txt', header=None)
df1[0] = df1[0].str.strip() # remove the 2 whitespace followed by the feild
df2 = df2[0].str[0:-2].str.split(' ').apply(pd.Series) # split the word and remove whitespace
df = df1.merge(df2)

Out[26]: 
       0         1     2     3
0   tony  20150405  1001  ewdf
1  peter  20140405  0001  io90
2   john  20120212  1110  zjc9

答案 1 :(得分:0)

您可以使用pandas

import pandas as pd

file1 = pd.read_csv('file1.txt', sep =' ', header=None)
file2 = pd.read_csv('file2.txt', sep=' ', header=None)

shortlist = file2.loc[file2[0].isin(file1.values.T[0])]

它会给你以下结果:

       0         1     2     3
1   john  20120212  1110  zjc9
3  peter  20140405     1  io90
5   tony  20150405  1001  ewdf

以上是DataFrame将其转换回数组只需使用shortlist.values

答案 2 :(得分:0)

您可以使用.split(' '),假设名称与file2.txt

中的其他信息之间始终有空格

以下是一个例子:

UserList = []

with open("file1.txt","r") as fuser:
        UserLine = fuser.readline()
        while UserLine!='':
            UserList.append(UserLine.split("\n")[0])    # Separate the user name from the new line command in the text file.
            UserLine = fuser.readline() 

InfoUserList = []   
InfoList = []

with open("file2.txt","r") as finfo:
        InfoLine = finfo.readline()
        while InfoLine!='':
            InfoList.append(InfoLine)
            line1 = InfoLine.split(' ')
            InfoUserList.append(line1[0])   # Take just the user name to compare it later
            InfoLine = finfo.readline()

for user in UserList:
    for i in range(len(InfoUserList)):
        if user == InfoUserList[i]:
            print InfoList[i]