python相当于sed

时间:2012-10-03 18:16:22

标签: python

有没有办法,没有双循环来完成以下sed命令的作用

输入:

Time
Banana
spinach
turkey

sed -i "/Banana/ s/$/Toothpaste/" file

输出:

Time
BananaToothpaste
spinach
turkey

到目前为止我所拥有的是一份双重清单,需要很长时间才能完成。

列出一个有一堆数字 列表b具有相同的数字,但顺序不同

对于A中的每个条目,我想在B中找到具有相同数字的行,并在其末尾添加值C.

希望这是有道理的,即使我的例子没有。

我在Bash中做了以下操作并且它正在工作但是它非常慢......

for line in $(cat DATSRCLN.txt.utf8); do
        srch=$(echo $line | awk -F'^' '{print $1}');
        rep=$(echo $line | awk -F'^' '{print $2}');
        sed -i "/$(echo $srch)/ s/$/^$(echo $rep)/" tmp.1;
done

谢谢!

8 个答案:

答案 0 :(得分:13)

使用re.sub()

newstring = re.sub('(Banana)', r'\1Toothpaste', oldstring)

这会捕获一个组(在第一个括号之间),并用ITSELF(\ number部分)替换它后跟一个所需的后缀。需要使用r''(原始字符串),以便正确解释转义。

答案 1 :(得分:9)

比赛的后来者,这是我在Python中使用sed的实现:

import re
import shutil
from tempfile import mkstemp


def sed(pattern, replace, source, dest=None, count=0):
    """Reads a source file and writes the destination file.

    In each line, replaces pattern with replace.

    Args:
        pattern (str): pattern to match (can be re.pattern)
        replace (str): replacement str
        source  (str): input filename
        count (int): number of occurrences to replace
        dest (str):   destination filename, if not given, source will be over written.        
    """

    fin = open(source, 'r')
    num_replaced = count

    if dest:
        fout = open(dest, 'w')
    else:
        fd, name = mkstemp()
        fout = open(name, 'w')

    for line in fin:
        out = re.sub(pattern, replace, line)
        fout.write(out)

        if out != line:
            num_replaced += 1
        if count and num_replaced > count:
            break
    try:
        fout.writelines(fin.readlines())
    except Exception as E:
        raise E

    fin.close()
    fout.close()

    if not dest:
        shutil.move(name, source) 

的示例:

sed('foo', 'bar', "foo.txt") 

将foo.txt中的'bar'替换为所有'foo'

sed('foo', 'bar', "foo.txt", "foo.updated.txt")

将'foo.txt'中的所有'foo'替换为'bar',并将结果保存在“foo.updated.txt”中。

sed('foo', 'bar', "foo.txt", count=1)

只会将第一次出现的'foo'替换为'bar',并将结果保存在原始文件'foo.txt'中

答案 2 :(得分:5)

如果您使用的是Python3,以下模块将帮助您: https://github.com/mahmoudadel2/pysed

wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py

将模块文件放入Python3模块路径,然后:

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)

答案 3 :(得分:3)

你实际上可以从python中调用sed。有很多方法可以做到这一点,但我喜欢使用sh模块。 (yum -y install python-sh)

我的示例程序的输出如下。

[me@localhost sh]$ cat input 
Time
Banana
spinich
turkey
[me@localhost sh]$ python test_sh.py 
[me@localhost sh]$ cat input 
Time
Toothpaste
spinich
turkey
[me@localhost sh]$ 

这是test_sh.py

import sh

sh.sed('-i', 's/Banana/Toothpaste/', 'input')

这可能只适用于LINUX。

答案 4 :(得分:0)

可以使用系统要求低的tmp文件,只需一次迭代就可以将整个文件复制到内存中:

#/usr/bin/python
import tempfile
import shutil
import os

newfile = tempfile.mkdtemp()
oldfile = 'stack.txt'

f = open(oldfile)
n = open(newfile,'w')

for i in f:
        if i.find('Banana') == -1:
                n.write(i)
                continue

        # Last row
        if i.find('\n') == -1:
                i += 'ToothPaste'
        else:
                i = i.rstrip('\n')
                i += 'ToothPaste\n'

        n.write(i) 

f.close()
n.close()

os.remove(oldfile)
shutil.move(newfile,oldfile)

答案 5 :(得分:0)

massedit

您可以将其用作命令行工具:

# Will change all test*.py in subdirectories of tests.
massedit.py -e "re.sub('failIf', 'assertFalse', line)" -s tests test*.py

您也可以将其用作库:

import massedit
filenames = ['massedit.py']
massedit.edit_files(filenames, ["re.sub('Jerome', 'J.', line)"])

答案 6 :(得分:0)

我发现answer supplied by Oz123很棒,但似乎无法100%正常工作。我是python的新手,但对其进行了修改和包装,使其可以在bash脚本中运行。这适用于使用python 2.7的osx。

# Replace 1 occurrence in file /tmp/1
$ search_replace "Banana" "BananaToothpaste" /tmp/1

# Replace 5 occurrences and save in /tmp/2
$ search_replace "Banana" "BananaToothpaste" /tmp/1 /tmp/2 5

search_replace

#!/usr/bin/env python
import sys
import re
import shutil
from tempfile import mkstemp

total = len(sys.argv)-1
cmdargs = str(sys.argv)
if (total < 3):
    print ("Usage: SEARCH_FOR REPLACE_WITH IN_FILE {OUT_FILE} {COUNT}")
    print ("by default, the input file is replaced")
    print ("and the number of times to replace is 1")
    sys.exit(1)

# Parsing args one by one 
search_for = str(sys.argv[1])
replace_with = str(sys.argv[2])
file_name = str(sys.argv[3])
if (total < 4):
    file_name_dest=file_name
else:
    file_name_dest = str(sys.argv[4])
if (total < 5):
    count = 1
else:
    count = int(sys.argv[5])

def sed(pattern, replace, source, dest=None, count=0):
    """Reads a source file and writes the destination file.

    In each line, replaces pattern with replace.

    Args:
        pattern (str): pattern to match (can be re.pattern)
        replace (str): replacement str
        source  (str): input filename
        count (int): number of occurrences to replace
        dest (str):   destination filename, if not given, source will be over written.        
    """

    fin = open(source, 'r')
    num_replaced = 0

    fd, name = mkstemp()
    fout = open(name, 'w')

    for line in fin:
        if count and num_replaced < count:
            out = re.sub(pattern, replace, line)
            fout.write(out)
            if out != line:
                num_replaced += 1
        else:
            fout.write(line)

    fin.close()
    fout.close()

    if file_name == file_name_dest:
        shutil.move(name, file_name) 
    else:
        shutil.move(name, file_name_dest) 

sed(search_for, replace_with, file_name, file_name_dest, count)

答案 7 :(得分:0)

感谢上面的Oz123,这里的sed不是逐行的,因此您的替换内容可以跨越换行符。较大的文件可能是个问题。

import re
import shutil
from tempfile import mkstemp

def sed(pattern, replace, source, dest=None):
"""Reads a source file and writes the destination file.

Replaces pattern with replace globally through the file.
This is not line-by-line so the pattern can span newlines.

Args:
    pattern (str): pattern to match (can be re.pattern)
    replace (str): replacement str
    source  (str): input filename
    dest (str):   destination filename, if not given, source will be over written.
"""

if dest:
    fout = open(dest, 'w')
else:
    fd, name = mkstemp()
    fout = open(name, 'w')

with open(source, 'r') as file:
    data = file.read()

    p = re.compile(pattern)
    new_data = p.sub(replace, data)
    fout.write(new_data)

fout.close()

if not dest:
    shutil.move(name, source)