Python替换变化的字符串的一部分

时间:2016-11-21 16:22:32

标签: python

python的新手刚刚掌握一切。我正在寻找替换我在数据中重复出现的字符串的一部分。我怀疑正则表达式将成为答案,但对python来说是如此陌生,我为了做到这一点而努力奋斗。

我的文字示例是" PROD v2.0 - 测试窗口 - 应用"。发生的事情是开发人员引入新窗口,PROD v2.0更改为v3.0等等。我想要做的是删除整个第一部分并离开"测试窗口 - 应用"

我正在使用的脚本中还有很多其他事情发生,所以我理想情况下会寻找帮助来放置它。

以下是我到目前为止的脚本。我删除了这方面的某些方面,因为这是一个工作项目和我无法分享的某些部分。任何帮助都会受到大力赞赏,我知道我的脚本可能写得不尽其力,我正在研究的项目即将推出,我只是想在这个阶段实现这一功能。

import pandas as pd
data_xls = pd.read_excel('REMOVED.xls', 'Sheet1', index_col=None)
data_xls.to_csv('//REMOVED.csv', encoding='utf-8')

import codecs
import pandas as pd
import os
#import dataset
from datetime import datetime as dt

targetDir = 'REMOVED'
outputFile = 'UPLOADSTEP1.txt'

substitutions = COLUMN SUBS REMOVED               

selectCols = [COLUMN ORDER REMOVED]
first = True

# Set working directory
os.chdir(targetDir)

# Iterate thorugh all files in directory
for i in os.listdir(os.getcwd()):
if i.endswith('.csv') and i.startswith('Temp'):
    print (i);
    # Files are UTF-8 encoded with BOM which Pandas cannot handle. Open with         coedcs first before passing to Pandas
    opened = codecs.open(i, 'rU', 'UTF-8')
    # Read file into dataframe
    df = pd.read_csv(opened, header=0)

   # Replace headers
    for row in substitutions:
        if row[0] in df.columns:
            df.rename(columns={row[0]:row[1]}, inplace=True)
            print(row[0], '->', row[1])

    # Save to csv
    if first:
        # print("First section")
        # First file save, retain headers and overwrite previous
        # destFile = open(outputFile, 'w')
        df.to_csv(outputFile, columns=selectCols, header=True, index=False, low_memory=False, sep='\t')
        first = False
    else:
        # print("Subsequent section")
        # Not first file save, remove headers and append to previous
        destFile = open(outputFile, 'a')
        df.to_csv(destFile, columns=selectCols, header=False, index=False, low_memory=False, sep='\t')
    continue

# Symbol Cleanse
f1 = open('UPLOADSTEP1.txt', 'r')
f2 = open('UPLOADSTEP2.txt', 'w')
for line in f1:
f2.write(line.replace(' – ', ' '))
f1.close()
f2.close()

1 个答案:

答案 0 :(得分:0)

这段代码远非最佳,但应该可以解决问题。

我假设您尝试替换的所有字符串都以" PROD vXXXX - "并且你没有出现过" PROD v"您不想重新发布(或者与之前的模式不匹配)

text = ''
with open(inputfilename,'r') as f:
    text = f.read()

while 'PROD v' in text:
    tail = text[text.find('PROD v'):]     # get the text after "PROD v"
    tail = tail[tail.find('-')+1:]        # get rid of everything before the nearest "-"
    text = text[:text.find('PROD v')] + tail

with open(outputfilename,'w') as f:
    f.write(text)