Python多字符串替换

时间:2015-08-12 19:35:26

标签: python regex string

我想在Python中进行多次字符串替换。

我有一本字典:

my_dict = {'Can I have some roti and aloo gobhi ?': 
              {'roti': ['pulka', 'butter kp', 'wheat parota', 'chapati',
                        'gobi parota', 'onion parota', 'paneer parota',
                        'kerala parota', 'aloo parota', 'plain naan',
                        'butter naan', 'garlic naan', 'plain kulcha',
                        'butter kulcha', 'lacha parota', 'tandoori roti',
                        'tandoori butter roti', 'roti'],
               'aloo gobhi': ['paneer butter masala', 'palak paneer', 
                              'kadai paneer', 'hydrabadi paneer', 
                              'kadai gobi', 'aloo gobi', 'aloo mattar', 
                              'mix veg curry', 'baby corn masala', 
                              'dal fry', 'palak dal', 'dal tadka', 
                              'mushroom masala', 'gobi masala', 
                              'paneer tikka masala', 
                              'mushroom tikka masala', 'aloo gobhi']
              }
          }

它基本上有一个句子作为键和值(它又是一个字典)。这个字典我把关键字作为项目替换在句子中的相应值(这是一个列表)。现在,我想使用主词典的键来构造一个句子,将roti替换为相应列表中的任何一个,并且' aloo gobhi'与任何相应列表的项目。

例如:

  

input_string ="我可以拥有一些roti和aloo gobhi吗?"
    output_string ="我可以吃一些pulka和panner黄油masala吗?"

更新: 我有一个excel文件(比如说food_items.xlsx),我有食物项目列表,分为甜点,开胃菜,主菜等。我有另一个excel文件(比如food_queries.xlsx)我有用户要求订购food_items.xlsx中存在的食品的查询。 我尝试编写一个脚本,用最少的用户查询覆盖food_items.xlsx中的所有食品项目,以便用最少的查询完成机器学习。

import xlrd
import xlsxwriter
import string
import random
import re
import time
import itertools


list_of_items = []
dict_of_names = {}

def createList(filename):
    try:
        book = xlrd.open_workbook(filename)
        sheet = book.sheet_by_name(book.sheet_names()[2])
        for i in xrange(sheet.ncols):
            list_1 = []
            for j in xrange(sheet.nrows):
                cell_value = sheet.cell(j,i).value
                if str(cell_value) in (None,""):
                    j+=1
                    break
                else:
                    list_1.append(str(cell_value).lower())
            dict_of_names[str(list_1[0]).upper()] = list_1[1:]

    except Exception, e:
        print e

def getFile(readFile):
    try:
        list_of_sentences = []
        row = 0
        col = 0
        query_book = xlrd.open_workbook(readFile)
        first_sheet = query_book.sheet_by_index(0)
        for i in xrange(first_sheet.ncols):
            for j in xrange(first_sheet.nrows):
                cell_value = str(first_sheet.cell(j,i).value)
                if cell_value in (None,""," "):
                    j += 1
                    # dict_of_names[keys].remove(value)
                else:
                    list_of_sentences.append(cell_value)
        replaceStrings(list_of_sentences)
    except Exception as e:
        print e



def replaceStrings(list_of_sentences):
    # all_dict = {}
    # for sentence in list_of_sentences:
    #   dict_values = {}
    #   for keys,values in dict_of_names.items():
    #       for val in values:
    #           temp_dict = {}
    #           if val in sentence:
    #               temp_dict[val] = dict_of_names[keys]
    #               dict_values.update(temp_dict)
    #   all_dict[sentence] = dict_values
    # print all_dict

    # for keys,values in all_dict.items() :

    # for b,c in itertools.izip(dict_values,food_item_1[0],food_item_1[1]):
        # print sentence.replace(a,b).replace(a,c)

    for sentence in list_of_sentences:
        dict_values = {}
        for keys,values in dict_of_names.items():
            for val in values:
                temp_dict = {}
                if val in sentence:
                    temp_dict[val] = dict_of_names[keys]
                    dict_values.update(temp_dict)


        keys = dict_values.keys()
        n = len(keys)
        for i in range(n):
            thisKey = keys[i]
            nextKey = keys[(i + 1) % n]
            # print thisKey,nextKey
            for c,a,b in itertools.izip(list_of_sentences, dict_values[thisKey],dict_values[nextKey]):
                new_cell = c.replace(thisKey,a).replace(nextKey,b)
                # del dict_values[a]
                print new_cell


            # for k in existing_names:
                # if k in cell.value:
                #   lines = str(cell.value).replace(k,str(random.choice(new_names_one)))\
                #       .replace(k,str(random.choice(new_names_two)))
                #   worksheet.write(row,col,lines)
                #   row  = row + 1
                # else:
                #   break


if __name__ == "__main__":
    print "starting execution.."
    # workbook = xlsxwriter.Workbook('Query_set_1.xlsx')
    # worksheet = workbook.add_worksheet()
    createList("total food queries.xlsx")
    getFile("total food queries.xlsx")

    # workbook.close()

更新2:

我想实现的基本算法是:

  1. 我需要涵盖所有食品(每种食品只能出现一次)。

  2. 一旦所有食品都被盖住,我就停下来。 (尽管用户仍然留下了很少的查询样本表格)

  3. 我的主要目标是涵盖所有食品,而不是用户的询问。

3 个答案:

答案 0 :(得分:1)

我会将主句保留为自己的字符串,然后替换单词并保存一个新字符串。

    a=(1 2 3 4 5 6 7 8 9)

file="/data/dev-staging/Scripts.txt"
dir="/data/test"


for i in ${a[@]} do

        if [ $i = 5 ]; then


                if [ -f "$file" ]; then

                        mv $file $dir
                        continue
                fi

        else


        fi

done

结果:

import random

sentence = 'Can I have some roti and aloo gobhi?'
new_sentence = sentence

replacements = {
'roti': ['pulka', 'butter kp', 'wheat parota', 'chapati', 'gobi parota', 'onion parota', 'paneer parota', 'kerala parota', 'aloo parota', 'plain naan', 'butter naan', 'garlic naan', 'plain kulcha', 'butter kulcha', 'lacha parota', 'tandoori roti', 'tandoori butter roti', 'roti'],
'aloo gobhi': ['paneer butter masala', 'palak paneer', 'kadai paneer', 'hydrabadi paneer', 'kadai gobi', 'aloo gobi', 'aloo mattar', 'mix veg curry', 'baby corn masala', 'dal fry', 'palak dal', 'dal tadka', 'mushroom masala', 'gobi masala', 'paneer tikka masala', 'mushroom tikka masala', 'aloo gobhi']
}

for key in replacements:
    new_sentence = new_sentence.replace(key, random.choice(replacements[key]))

如果你只是想要每道菜的随机物品而不是只替换那些特定的菜肴,你应该使用字符串格式:

>>> new_sentence
'Can I have some onion parota and aloo mattar?'

结果:

import random

sentence = 'Can I have some {} and {}?'

replacements = [
['pulka', 'butter kp', 'wheat parota', 'chapati', 'gobi parota', 'onion parota', 'paneer parota', 'kerala parota', 'aloo parota', 'plain naan', 'butter naan', 'garlic naan', 'plain kulcha', 'butter kulcha', 'lacha parota', 'tandoori roti', 'tandoori butter roti', 'roti'],
['paneer butter masala', 'palak paneer', 'kadai paneer', 'hydrabadi paneer', 'kadai gobi', 'aloo gobi', 'aloo mattar', 'mix veg curry', 'baby corn masala', 'dal fry', 'palak dal', 'dal tadka', 'mushroom masala', 'gobi masala', 'paneer tikka masala', 'mushroom tikka masala', 'aloo gobhi']
]

根据您更新的问题及其评论,您根本不会寻找随机替代品;你正在寻找那两个>>> new_sentence = sentence.format(*(random.choice(l) for l in replacements)) >>> new_sentence 'Can I have some tandoori roti and mix veg curry?' >>> new_sentence = sentence.format(*(random.choice(l) for l in replacements)) >>> new_sentence 'Can I have some pulka and paneer butter masala?' >>> new_sentence = sentence.format(*(random.choice(l) for l in replacements)) >>> new_sentence 'Can I have some lacha parota and palak paneer?' 的笛卡尔积。我们将使用list模块中的product()函数以及字符串格式。

itertools

结果(只有每30个句子,而不是整个句子):

import itertools

replacements = [
['pulka', 'butter kp', 'wheat parota', 'chapati', 'gobi parota', 'onion parota', 'paneer parota', 'kerala parota', 'aloo parota', 'plain naan', 'butter naan', 'garlic naan', 'plain kulcha', 'butter kulcha', 'lacha parota', 'tandoori roti', 'tandoori butter roti', 'roti'],
['paneer butter masala', 'palak paneer', 'kadai paneer', 'hydrabadi paneer', 'kadai gobi', 'aloo gobi', 'aloo mattar', 'mix veg curry', 'baby corn masala', 'dal fry', 'palak dal', 'dal tadka', 'mushroom masala', 'gobi masala', 'paneer tikka masala', 'mushroom tikka masala', 'aloo gobhi']
]

all_combos = itertools.product(*replacements)

all_sentences = ['Can I have some {} and {}?'.format(*combo) for combo in all_combos]

答案 1 :(得分:0)

您可以尝试以下方式:

if input_string in my_dict:
  output_string = input_string
  for k in my_dict[input_string].keys():
    new_word = random.choice(my_dict[input_string][k])
    output_string.replace(k,new_word)

答案 2 :(得分:0)

我根据this answer缩短了解决方案:

import random

replacements = [(key, random.choice(my_dict[input_string][key])) for key in my_dict[input_string].iterkeys()]
output_string = reduce(lambda a, kv: a.replace(*kv), replacements, input_string)

您基本上构建了一个元组列表,每个元组都包含一个单词及其替换。然后,您可以使用Python的reduce函数来执行每次替换。

  

<强> reduce(function, iterable[, initializer])   从左到右累加两个参数的函数到iterable项,以便将iterable减少为单个值。 [...]

示例输出:

Can I have some paneer parota and baby corn masala ?
Can I have some paneer parota and gobi masala ?
Can I have some butter naan and gobi masala ?
Can I have some tandoori butter roti and gobi masala ?
Can I have some onion parota and hydrabadi paneer ?
...