Question

我有一个包含许多行和多列的非常大的数组（称为＆＃34; self.csvFileArray＆＃34;），它由我从CSV文件中读取的行组成，使用以下代码处理CSV文件......

with open(self.nounDef["Noun Source File Name"], 'rU') as csvFile:
  for idx, row in enumerate(csv.reader(csvFile, delimiter=',')):
    if idx == 0:
      self.csvHeader = row
    self.csvFileArray.append(row)

我有一个替换映射的很长的字典，我想用它来代替...

replacements = {"str1a":"str1b", "str2a":"str2b", "str3a":"str3b", etc.}

我想在类方法中执行此操作，如下所示...

def m_globalSearchAndReplace(self, replacements):
  # apply replacements dictionary to self.csvFileArray...

我的问题：在＃34; self.csvFileArray＆＃34;中使用＆＃34; replacements替换字符串的最有效方法是什么？＆＃34;字典？

澄清说明：

我看了this post，但似乎无法让它适用于此案例。
此外，我想替换匹配的单词中的字符串，而不仅仅是整个单词。因此，使用＆＃34; SomeCompanyName＆＃34;：＆＃34; xyz＆＃34;的替换映射，我可能会有一个类似＆＃34; 的公司SomeCompanyName公司拥有名为abcSomeCompanyNamedef的产品专利。 ＆＃34;你会注意到字符串必须在句子中被替换两次......一次作为整个单词，一次作为嵌入字符串。

Answer 1

以下内容与上述相关并已经过全面测试......

  def m_globalSearchAndReplace(self, dataMap):
    replacements = dataMap.m_getMappingDictionary()
    keys = replacements.keys()
    for row in self.csvFileArray: # Loop through each row/list
      for idx, w in enumerate(row): # Loop through each word in the row/list
        for key in keys: # For every key in the dictionary...
          if key != 'NULL' and key != '-' and key != '.' and key != '':
            w = w.replace(key, replacements[key])
        row[idx] = w

简而言之，循环遍历csvFileArray中的每一行并获取每个单词。
然后，对于行中的每个单词，循环遍历字典（称为“替换”）键以访问和应用每个映射。
然后（假设条件合适）将值替换为其映射值（在字典中）。

注意： 虽然它有效但我不相信使用无限循环是解决问题的最有效方法，我相信必须有更好的方法，使用正则表达式。所以，我会稍微公开一下，看看是否有人可以改进答案。

Answer 2

在一个大循环中？您可以将csv文件作为字符串加载，这样您只需查看列表一次而不是每个项目。虽然它不是非常有效，因为python字符串是不可变的，你仍然面临同样的问题。

根据这个答案Optimizing find and replace over large files in Python（重新提高效率），也许逐行会更好地工作，所以如果实际上这个问题确实存在问题，你就不会在内存中有巨大的字符串。

编辑：就像这样...

# open original and new file.
with open(old_file, 'r') as old_f, open(new_file, 'w') as new_f:
    # loop through each line of the original file (old file)
    for old_line in old_f:
        new_line = old_line
        # loop through your dictionary of replacements and make them.
        for r in replacements:
            new_line = new_line.replace(r, replacements[r])
        # write each line to the new file.
        new_f.write(new_line)

无论如何，我会忘记该文件是一个csv文件，只是把它当作一个大的行或字符集。

如何使用字典有效地替换基于CSV的大规模数组中的字符串？

2 个答案: