Question

有一个.dbf文件，我想读取它的数据，但字符串是不可读的。我不知道字符串的编码！我可以找到它吗？是否可以在.dbf文件中获取字符串的编码？是否有可能从.dbf文件中获取不可读的字符串？是否有可能得到一个用ANSI编码的不可读字符串？因为字符串是不可读的，这是否意味着它以某种方式编码？

编辑：

下面的代码是我如何连接到.dbf文件并阅读它

using (OleDbConnection con = new OleDbConnection(constr))
    {

        var sql =
            "select name, family, account, is_no, code, bdate, is_pl, father from CP where account like '%23854%' ";

        OleDbCommand cmd = new OleDbCommand(sql, con);
        con.Open();
        DataSet ds = new DataSet();
        OleDbDataAdapter da = new OleDbDataAdapter(cmd);
        da.Fill(ds);

        var dt = ds.Tables[0];
        foreach (DataRow row in dt.Rows)
        {
            var account = row["account"];
        }
    }

并以帐号返回23854æ∞。

EDIT2：

我使用了一些第三方来查找有关我的.dbf文件的信息，下图显示了

enter image description here

EDIT3：

此处是DBF Commander Pro

的数据屏幕截图

不可读的字符位于Arabic/Persian

enter image description here

Answer 1

编辑后：

所以现在唯一的问题是转换。

所需的编码可能是（我在维基百科中搜索过）：

＆＃34; ISO-639-1＆＃34;
＆＃34; ISO-639-2＆＃34;
＆＃34; ISO-639-3＆＃34;

或者：

private static String getAsciz(byte[] bytes, int offset, int offset2) {
    for (int i = offset; i < offset2; ++i) {
        if (bytes[i] == 0) {
            offset2 = i;
        }
    }
    final String encoding = "ISO-639-1";
    try {
        return new String(bytes, offset, offset2 - offset, encoding).trim();
    } catch (UnsupportedEncodingException e) {
        throw new IllegalStateException("Charset not installed: " + encoding);
    }
}

或者当使用第三方库时，可能通过撤消编码来进行黑客攻击（请注意，这可能是变量编码：当前平台编码）：

String s = thirdParty.getColumn("NAME");

// Reconstruct the bytes (Windows Latin-1, Western Europe)
byte[] bytes = s.getBytes("Cp1252");

s = new String(bytes, "ISO-639-1");

旧答案：

.dbf是具有固定长度的记录的二进制格式。在每个记录中，字段值是普通字符数组（最有可能是ANSI）。

我的猜测是，您尝试将文件作为文本阅读。

或.dbf文件已加密。使用十六进制编辑器查看文件。

您可以将其读作二进制块。首先是带有列定义的标题部分。然后带有删除标记的实际记录。

由于这是一种旧格式，因此有许多库。你没有提到要使用哪种编程语言，但是在互联网上使用十六进制转储和一些格式信息，你可以轻松地创建一个dbf阅读器。

对制表符分隔文字的简单转换：

未经测试且在java中，但显示它是微不足道的。然后，您可以使用Excel左右进行转换和OLE DB。注意：作为输入in我在这里使用ISO-8859-1，并作为输出out UTF-8。我还为UTF-8识别编写BOM（文件标记开头）。

private static final boolean TEST = true;

private static class FieldDef {
    String name;
    char type;
    int length;
    int decimals;
}

public static void main(String[] args) {
    File dbfFile = new File("C:/aaa/bbb.dbf");
    String csvName = dbfFile.getName().replaceFirst("(?i)\\.dbf$", "") + ".csv";
    File csvFile = new File(dbfFile.getParentFile(), csvName);
    try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(dbfFile));
            PrintWriter out = new PrintWriter(csvFile, "UTF-8")) {
        byte[] header = new byte[0x20];
        in.read(header);

        // Version:
        switch (header[0x00]) {
            case 0x03:
                System.out.println("dBaseIII without Memo");
                break;
            case -128 + 0x03:
                System.out.println("dBaseIII with Memo");
                break;
            default:
                throw new UnsupportedOperationException("dBase Version not 3");
        }

        int recordCount = getInt(header, 0x04);
        int headerSize = getShort(header, 0x08);
        int recordSize = getShort(header, 0x0a);

        List<FieldDef> fieldDefs = new ArrayList<>();
        byte[] fieldDefBytes = new byte[0x20];
        int offset = header.length;
        out.print("\uFFFE"); // UTF-8 BOM to distinghuish it from Windows ANSI.
        out.print("DEL"); // Deletion marker.
        while (offset + 1 < headerSize) {
            in.read(fieldDefBytes);
            FieldDef fieldDef = new FieldDef();
            fieldDef.name = getAsciz(fieldDefBytes, 0, 11);
            fieldDef.type = (char)fieldDefBytes[11];
            // #4 int - field data address.
            fieldDef.length = 0xFF & fieldDefBytes[16];
            fieldDef.decimals = 0xFF & fieldDefBytes[17];
            out.print('\t');
            out.print(fieldDef.name);
            fieldDefs.add(fieldDef);
            System.out.printf("%-11s %c (%d, %d)%b", fieldDef.name,
                    fieldDef.type, fieldDef.length, fieldDef.decimals);
        }
        out.println();
        int b = in.read();
        assert b == 0x0d;

        byte[] record = new byte[recordSize];
        for (int recno = 0; recno < recordCount; ++recno) {
            if (TEST && recno > 100) {
                break;
            }
            in.read(record);
            //boolean deleted = (0xFF & record[0]) != 0x20; // == 0x2A '*'
            String deletionMark = getAsciz(record, 0, 1);
            out.print(deletionMark);
            offset = 1;
            for (FieldDef fieldDef : fieldDefs) {
                out.print('\t');
                String fieldValue = getAsciz(record, offset, offset + fieldDef.length);
                out.print(fieldValue);
                offset += fieldDef.length;
            }
            out.println();
        }
        // assert in.read() == 0x1A; // End-of-file byte.
    } catch (IOException ex) {
        Logger.getLogger(Dbf3ToTsv.class.getName()).log(Level.SEVERE, null, ex);
    }
}

private static int getInt(byte[] bytes, int offset) {
    int n = 0;
    for (int i = 0; i < 4; ++i) {
        n = (n << 8) | (0xFF & bytes[offset + 4 - 1 - i]);
    }
    return n;
}

private static int getShort(byte[] bytes, int offset) {
    int n = 0;
    for (int i = 0; i < 2; ++i) {
        n = (n << 8) | (0xFF & bytes[offset + 2 - 1 - i]);
    }
    return n;
}

private static String getAsciz(byte[] bytes, int offset, int offset2) {
    for (int i = offset; i < offset2; ++i) {
        if (bytes[i] == 0) {
            offset2 = i;
        }
    }
    return new String(bytes, offset, offset2 - offset, StandardCharsets.ISO_8859_1).trim();
}

Answer 2

.dbf文件是混合二进制和编码文本文件格式。到encoded我不是指加密，我的意思是根据.dbf文件使用的语言（例如cp1252（Windows英语）或cp1251（西里尔语））编码到代码页中。

如果您想要程序访问和控制，那么您需要编写自己的库，或者使用已经存在的库中的一个。

如果您正确使用库但仍然无意义，可能会加密，或者文件可能已损坏。

Answer 3

尝试使用DBF Commander Pro打开文件。它支持dBase 3.如果文件将被打开，请分享该表的屏幕截图。也许你的文件只有一个错误的编码，你只需要设置正确的字符集标志。如果是，请单击工具 - ＆gt;设置代码页，然后从列表中选择适当的编码。

Answer 4

您的数据类似于Iran System编码数据。这是一种非常特殊的编码，在伊朗早期的DOS时代（FoxPro日！）中使用。你可以在这里找到一个C＃转换器：https://github.com/mohsen-d/IranSystemConvertor

More info（波斯语）

Answer 5

萨拉姆，尝试在控制面板中更改 Windows 设置

地区和语言
行政
更改系统本地...
阿拉伯语（阿尔及利亚）或波斯语

而且 dbf 数据将是可读的

从.dbf文件中读取编码字符串

5 个答案: