Question

我有一个名为Strings.h的文件，我用它来本地化我的应用程序。我想搜索我的所有类文件，找出我是否以及在哪里使用每个字符串，并输出每个字符串的类和行号。

我的想法是使用Python，但也许这是错误的工具。另外，我有一个基本的算法，但我担心它需要很长时间才能运行。你能写这个脚本来做我想要的，甚至只是建议一个更好的算法吗？

Strings.h看起来像这样：

#import "NonLocalizedStrings.h"

#pragma mark Coordinate Behavior Strings
#define LATITUDE_WORD NSLocalizedString(@"Latitude", @"used in coordinate behaviors")
#define LONGITUDE_WORD NSLocalizedString(@"Longitude", @"used in coordinate behaviors")
#define DEGREES_WORD NSLocalizedString(@"Degrees", @"used in coordinate behaviors")
#define MINUTES_WORD NSLocalizedString(@"Minutes", @"Used in coordiante behaviors")
#define SECONDS_WORD NSLocalizedString(@"Seconds", @"Used in DMSBehavior.m")

...

脚本应该采用以#define开头的每一行，然后列出#define之后出现的单词（例如）LATITUDE_WORD

伪代码可能是：

file = strings.h
for line in file:
  extract word after #define
  search_words.push(word) 

print search_words
[LATITUDE_WORD, LONGITUDE_WORD, DEGREES_WORD, MINUTES_WORD, SECONDS WORD]

获得单词列表后，我的伪代码就像：

found_words = {}
for word in words:
   found_words[word] = []

for file in files:
  for line in file:
    for word in search_words:
      if line contains word:
        found_words[word].push((filename, linenumber))   

print found_words

所以，找到的单词看起来像是：

 {
   LATITUDE_WORD: [
                    (foo.m, 42),
                    (bar.m, 132) 
                  ],
   LONGITUDE_WORD: [
                    (baz.m, 22),
                    (bim.m, 112) 
                  ],

 }

Answer 1

这个[在bash中]怎么样？

$ pattern="\\<($(grep '^#define ' Strings.h | cut -d' ' -f2 | tr '\n' '|' | sed 's/|$//'))\\>"
$ find project_dir -iname '*.m' -exec egrep -Hno "${pattern}" {} + > matches

输出：

project_dir/bar.m:132:LATITUDE_WORD
project_dir/baz.m:22:LONGITUDE_WORD
project_dir/bim.m:112:LONGITUDE_WORD
project_dir/foo.m:42:LATITUDE_WORD

编辑：我已修改上面的代码，将其输出重定向到文件matches，因此我们可以使用它来显示从未找到的字词：

for word in $(grep '^#define ' Strings.h | cut -d' ' -f2)
do
    if ! cut -d':' -f3 matches | grep -q "${word}"
    then
        echo "${word}"
    fi
done

Answer 2

所以，看起来你有正确的想法。以下是您所拥有的优点和缺点。

<强>优点：

如果您使用Python，您的伪代码几乎会换行直接写到你的脚本。
你可以学习更多关于Python的知识（对于这样的事情有很高的技巧）。

<强>缺点：

Python的运行速度比已经发布的其他一些基于bash的解决方案要慢一点（如果您要搜索大量文件，这会出现问题）。
您的Python脚本比其他解决方案稍长一些，但您的输出也可以更灵活。

<强>答案：因为我熟悉Python，这就是你原来要求的，这里有一些你可以使用的代码：

#!/usr/bin/env python

# List the files you want to search here
search_files = []
word_file = open('<FILE_PATH_HERE>', 'r')

# Allows for sorted output later.
words = []

#Contains all found instances.
inst_dict = {}

for line in word_file:
    if line[0:7] == "#define":
        w = line[7:].split()[0]
        words.append(w)
        inst_dict[w] = []

for file_name in search_files:
    file_obj = open(file_name, 'r')
    line_num = 0
    for line in file_obj:
        for w in words:
            if w in line:
                inst_dict[w].append((file_name,line_num))
        line_num += 1

# Do whatever you want with 'words' and 'inst_dict'
words.sort()
for w in words:
    string = w + ":\n"
    for inst in inst_dict[w]:
        string += "\tFile: " + inst[0] + "\n"
        string += "\tLine: " + inst[1] + "\n"
    print string

我尚未测试代码的搜索部分，因此请按“原样”使用，风险自负。祝您好运，并随意提出问题或根据需要增加代码。您的请求非常简单，并且有很多解决方案，所以我宁愿您了解其工作原理。

Answer 3

此解决方案使用awk和globstar（后者需要Bash 4）。我认为可以进一步改进，但可以考虑这是一种选择。

shopt -s globstar

awk 'NR==FNR { if ($0 ~ /^#define/) found[$2]=""; next; } 
     {
       for (word in found){
         if ($0 ~ word) 
           found[word]=found[word] "\t" FILENAME ":" FNR "\n";
       } 
     }
     END { for (word in found) print word ":\n" found[word]}
    ' Strings.h **/*.m

使用你发布的Strings.h片段，这是我得到的那种输出（我编写了一些测试文件）

LATITUDE_WORD:
    lala1.m, 2
    lala3.m, 1

DEGREES_WORD:
    lala2.m, 5

SECONDS_WORD:

MINUTES_WORD:
    lala3.m, 3

LONGITUDE_WORD:
    lala3.m, 2

p / s：我没有用globstar对此进行测试，因为我现在使用的bash是v3（pfff！）

Answer 4

你应该尝试：

grep -oP '^#define\s+\K\S+' strings.h

如果您的grep缺少-P选项：

perl -lne 'print $& if /^#define\s+\K\S+/' strings.h

Answer 5

这是一个Python程序。它可以减少并简化，但它可以工作。

import re
l=filecontent.split('\n')
for item in l:
  if item.startswith("#define"):
    print re.findall("#define .+? ", item)[0].split(' ')[1]

Answer 6

#!/bin/bash
# Assuming $files constains a list of your files
word_list=( $(grep '^#define' "${files[@]}" | awk '{ print $2 }') )

从文件中提取单词，然后列出文件以及包含这些单词的行号

6 个答案: