Question

前一段时间我们接管了遗留代码库的责任。

这个非常糟糕的结构/编写代码的一个怪癖是它包含许多非常庞大的结构，每个结构包含数百名成员。我们做的很多步骤之一就是清理尽可能多地使用未使用的代码，因此需要找到未使用的结构/结构成员。

关于结构体，我想出了python，GNU Global和ctags的组合来列出未使用的结构成员。

基本上，我正在做的是使用ctags生成标签文件，下面的python脚本解析该文件以找到所有结构成员然后使用GNU Global在之前进行查找生成全局数据库以查看代码中是否使用了该成员。

这种方法存在许多相当严重的缺陷，但有点像解决了我们面临的问题，为我们提供了一个良好的开端清理。

必须有更好的方法来做到这一点！

问题是：如何找到未使用的结构和结构成员在代码库中？

#!/usr/bin/env python

import os
import string
import sys
import operator

def printheader(word):
    """generate a nice header string"""
    print "\n%s\n%s" % (word, "-" * len(word))

class StructFreqAnalysis:
    """ add description"""
    def __init__(self):
        self.path2hfile=''
        self.name=''
        self.id=''
        self.members=[]

    def show(self):
        print 'path2hfile:',self.path2hfile
        print 'name:',self.name
        print 'members:',self.members
        print

    def sort(self):
        return sorted(self.members, key=operator.itemgetter(1))

    def prettyprint(self):
        '''display a sorted list'''
        print 'struct:',self.name
        print 'path:',self.path2hfile
        for i in self.sort():
            print '    ',i[0],':',i[1]
        print

f=open('tags','r')

x={} # struct_name -> class
y={} # internal tags id -> class

for i in f:
    i=i.strip()
    if 'typeref:struct:' in i:
        line=i.split()
        x[line[0]]=StructFreqAnalysis()
        x[line[0]].name=line[0]
        x[line[0]].path2hfile=line[1]
        for j in line:
            if 'typeref' in j:
                s=j.split(':')
                x[line[0]].id=s[-1]
                y[s[-1]]=x[line[0]]

f.seek(0)
for i in f:
    i=i.strip()
    if 'struct:' in i:
        items=i.split()
        name=items[0]
        id=items[-1].split(':')[-1]
        if id:
            if id in y:
                key=y[id]
                key.members.append([name,0])
f.close()

# do frequency count
for k,v in x.iteritems():
    for i in v.members:
        cmd='global -a -s %s'%i[0]     # -a absolute path. use global to give src-file for member
        g=os.popen(cmd)
        for gout in g:
            if '.c' in gout:
                gout=gout.strip()
                f=open(gout,'r')
                for line in f:
                    if '->'+i[0] in line or '.'+i[0] in line:
                        i[1]=i[1]+1
                f.close()

printheader('All structures')
for k,v in x.iteritems():
    v.prettyprint()

#show which structs that can be removed
printheader('These structs could perhaps be removed')
for k,v in x.iteritems():
    if len(v.members)==0:
        v.show()

printheader('Total number of probably unused members')
cnt=0
for k,v in x.iteritems():
    for i in v.members:
        if i[1]==0:
            cnt=cnt+1
print cnt

修改

正如@ Jens-Gustedt所提出的，使用编译器是一种很好的方法。在使用编译器方法之前，我正在采用一种可以进行某种“高级”过滤的方法。

Answer 1

如果这些只是少数struct，并且如果代码没有通过其他类型访问struct的糟糕行为......那么您可以只注释掉您的第一个struct的所有字段1}}让编译器告诉你。

取消注释一个二次使用的字段，直到编译器满意为止。然后，一旦编译，进行良好的测试，以确保没有黑客的先决条件。

迭代所有struct。

肯定不是很漂亮，但最后你至少有一个人对这段代码有所了解。

Answer 2

使用coverity。这是检测代码缺陷的绝佳工具，但成本有点高。

Answer 3

虽然这是一个很老的帖子。但最近我使用python和gdb做了同样的事情。我编译了以下代码片段，其结构位于层次结构的顶部，然后使用gdb在结构上执行打印类型并重新诅咒其成员。

#include <usedheader.h>
UsedStructureInTop *to_print = 0;
int main(){return 0;}

(gdb) p to_print
(gdb) $1 = (UsedStructureInTop *) 0x0
(gdb) pt UsedStructureInTop
type = struct StructureTag {
    members displayed here line by line
}
(gdb)

虽然我的目的没什么不同。它是生成一个只包含结构UsedStructureInTop及其依赖类型的头。有编译器选项可以执行此操作。但是它们不会删除包含的头文件中找到的未使用/未链接的结构。

Answer 4

在C规则下，可以通过具有类似布局的另一个结构访问struct成员。这意味着您可以通过struct Foo {int a; float b; char c; };访问struct Bar { int x; float y; };（当然除了Foo::c）。

因此，您的算法可能存在缺陷。很难找到你想要的东西，这就是为什么C很难优化的原因。

找到未使用的结构和结构成员

4 个答案: