python的新手刚刚掌握一切。我正在寻找替换我在数据中重复出现的字符串的一部分。我怀疑正则表达式将成为答案,但对python来说是如此陌生,我为了做到这一点而努力奋斗。
我的文字示例是" PROD v2.0 - 测试窗口 - 应用"。发生的事情是开发人员引入新窗口,PROD v2.0更改为v3.0等等。我想要做的是删除整个第一部分并离开"测试窗口 - 应用"
我正在使用的脚本中还有很多其他事情发生,所以我理想情况下会寻找帮助来放置它。
以下是我到目前为止的脚本。我删除了这方面的某些方面,因为这是一个工作项目和我无法分享的某些部分。任何帮助都会受到大力赞赏,我知道我的脚本可能写得不尽其力,我正在研究的项目即将推出,我只是想在这个阶段实现这一功能。
import pandas as pd
data_xls = pd.read_excel('REMOVED.xls', 'Sheet1', index_col=None)
data_xls.to_csv('//REMOVED.csv', encoding='utf-8')
import codecs
import pandas as pd
import os
#import dataset
from datetime import datetime as dt
targetDir = 'REMOVED'
outputFile = 'UPLOADSTEP1.txt'
substitutions = COLUMN SUBS REMOVED
selectCols = [COLUMN ORDER REMOVED]
first = True
# Set working directory
os.chdir(targetDir)
# Iterate thorugh all files in directory
for i in os.listdir(os.getcwd()):
if i.endswith('.csv') and i.startswith('Temp'):
print (i);
# Files are UTF-8 encoded with BOM which Pandas cannot handle. Open with coedcs first before passing to Pandas
opened = codecs.open(i, 'rU', 'UTF-8')
# Read file into dataframe
df = pd.read_csv(opened, header=0)
# Replace headers
for row in substitutions:
if row[0] in df.columns:
df.rename(columns={row[0]:row[1]}, inplace=True)
print(row[0], '->', row[1])
# Save to csv
if first:
# print("First section")
# First file save, retain headers and overwrite previous
# destFile = open(outputFile, 'w')
df.to_csv(outputFile, columns=selectCols, header=True, index=False, low_memory=False, sep='\t')
first = False
else:
# print("Subsequent section")
# Not first file save, remove headers and append to previous
destFile = open(outputFile, 'a')
df.to_csv(destFile, columns=selectCols, header=False, index=False, low_memory=False, sep='\t')
continue
# Symbol Cleanse
f1 = open('UPLOADSTEP1.txt', 'r')
f2 = open('UPLOADSTEP2.txt', 'w')
for line in f1:
f2.write(line.replace(' – ', ' '))
f1.close()
f2.close()
答案 0 :(得分:0)
这段代码远非最佳,但应该可以解决问题。
我假设您尝试替换的所有字符串都以" PROD vXXXX - "并且你没有出现过" PROD v"您不想重新发布(或者与之前的模式不匹配)
text = ''
with open(inputfilename,'r') as f:
text = f.read()
while 'PROD v' in text:
tail = text[text.find('PROD v'):] # get the text after "PROD v"
tail = tail[tail.find('-')+1:] # get rid of everything before the nearest "-"
text = text[:text.find('PROD v')] + tail
with open(outputfilename,'w') as f:
f.write(text)