Question

如何合并两个文件，并保持不匹配的行？

我的第一个文件如下：

apples          1.4       
grapes          1.3
pears           2.1
oranges         1.1
grapefruit      1.0

我的第二个文件如下：

apples         Alex
grapes         Margery
grapefruit     Francis

我的输出应该是：

apples          1.4     Alex  
grapes          1.3     Margery
pears           2.1
oranges         1.1
grapefruit      1.0     Francis

对此有任何帮助将非常感谢，谢谢。

Answer 1

鉴于文件2中没有条目，文件1中没有条目，这是一个（未经测试的）问题解决方案：

import re

names = {}

with open("second.txt") as second:
    for line in second:
        m = match("([^\s]*)\s*([^\s]*)", line.strip())
        if m:
            names[m.group(1)] = m.group(2)

with open("first.txt") as first, open("output.txt", w) as out:
    for line in first:
        writeline = line
        m = match("([\s]*).*)", line.strip())
        if m:
            name = names.get(m.group(1), None)
            if name:
                writeline += "     " + name
        out.write(writeline)

我在做什么，首先解析第二个文件，将所有水果和相应的名字读入字典。然后浏览第一个文件，检查每行中的水果，找到字典中的相应条目，如果发现该名称已添加到输出中。

Answer 2

您可以在pandas中使用数据帧来执行此操作。将输入转换为数据帧，例如a和b。

 import pandas as pd

Dataframe a

           x    y
 0      apples  1.4
 1      grapes  1.3
 2       pears  2.1
 3     oranges  1.1
 4  grapefruit  1.0

Dataframe b

        k        l
 0  apples     Alex
 1  grapes  Margery
 2   pears  Francis

现在使用水果名称重命名该列，如果它们不同。

 b.columns=['x','l']

然后合并列名

new=pd.merge(a, b, on='x', how='outer')

您的新数据框如下所示

           x    y        l
 0      apples  1.4     Alex
 1      grapes  1.3  Margery
 2       pears  2.1  Francis
 3     oranges  1.1      NaN
 4  grapefruit  1.0      NaN

Answer 3

使用awk，您可以执行以下操作：

$ awk 'FNR==NR{seen[$1]=$2; next}         # read first file and construct array
       $1 in seen{seen[$1]=seen[$1] OFS $2} # add entry from second file
       END{ for (e in seen) print e, seen[e]}' file1 file2
apples 1.4 Alex
grapefruit 1.0 Francis
oranges 1.1
pears 2.1
grapes 1.3 Margery

订单将从原始文件更改，但未将其作为要求说明。

如果您想要相同的订单和原始文件，并且更接近您的示例，您可以这样做：

$ awk 'BEGIN{OFS="\t"}
       FNR==NR{ord[FNR]=$1
               seen[$1]=$2
               next}
       $1 in seen {seen[$1]=seen[$1] OFS $2}
       END{ for (i=1;i in ord;i++)
               printf "%-10s\t%s\n", ord[i], seen[ord[i]]}' f1 f2
apples      1.4 Alex
grapes      1.3 Margery
pears       2.1
oranges     1.1
grapefruit  1.0 Francis

如何连接两个文件，并保留不匹配的行？

3 个答案: