Question

我正在使用来自CS：GO的一些脚本文件，我必须从这个文件中获取一些有用的信息并将这些数据导入我的python应用程序。

以下是txt数据格式的示例：

https://steamcdn-a.akamaihd.net/apps/730/scripts/items/items_game.83a9ad4690388868ab33c627af730c43d4b0f0d9.txt

值是随机格式（Color \ Pos \ String），但我只需要一个包含所有值的字符串。我需要将这些信息输入字典中，例如：

print(global_dict['items_game']['game_info']['first_valid_class'])
<<2

我现在正在使用解析器，但我遇到了很多问题。该文件格式是否有现成的解决方案？

Answer 1

正如CoryKramer所指出的那样，该文件几乎 JSON。

所以，我在下面编写了一个自定义解析器，它通过逐行读取源配置并将更正的JSON格式写入输出文件来解析文件。

我甚至使用JSONLint测试了输出并成功验证了文件。

注意： 编写此代码是为了解析位于以下位置的任何文件：

%STEAMINSTALL%/SteamApps/common/Counter-Strike Global Offensive/csgo/scripts

要使用以下脚本，请执行：

 $ ConfigParser.py -h

 usage: ConfigParser.py [-h] [-s SRC] dest

 positional arguments:
   dest               file where the parsed JSON will be written to

 optional arguments:
   -h, --help         show this help message and exit
   -s SRC, --src SRC  source config file

#!/usr/bin/env python3

"""ConfigParser.py: Parses a Valve configuration file.

The configuration file for the CS:GO game items is read line-by-line
and written to an output file. The missing colons and commas are added
to their appropriate places. The output file passed JSLint validation.
"""

from argparse import ArgumentParser
from shlex import split

__author__ = "Mr. Polywhirl"
__copyright__ = "Copyright 2016, Stack Overflow"
__credits__ = []
__license__ = "GPLv3"
__version__ = "1.1.0"
__maintainer__ = "Mr. Polywhirl"
__email__ = "https://stackoverflow.com/users/1762224"
__status__ = "Production"

# This is the default file that is parsed.
DEFAULT_CONFIG_FILE = 'C:/Program Files (x86)/Steam/steamapps/common/\
Counter-Strike Global Offensive/csgo/scripts/items/items_game.txt'

def parseConfig(src_filename, dest_filename):
    out_file = open(dest_filename, 'w')
    indent_ch = '\t'
    curr_level = 1
    out_file.write('{\n')

    with open(src_filename, 'r') as f:
        for line in f.readlines():
            if line.strip().startswith('//'):
                continue # Skip comments.

            level = line.find('"') + 1

            if level < 1:
                continue # Skip lines without tokens.

            values = ['"' + v + '"' for v in split(line)]
            indent = indent_ch * level

            if level != curr_level:
                delta = curr_level - level
                curr_level = level

                if delta > 0:
                    for i in range(delta, 0, -1):
                        out_file.write('\n' + (indent_ch * (level + i - 1)) + '}')
                        if i == 1:
                            out_file.write(',')
                    out_file.write('\n')

            elif level == curr_level and level > 1: 
                out_file.write(',\n')

            if len(values) == 1:
                out_file.write(indent + values[0] + ' : {\n')
            else:
                out_file.write(indent + ' : '.join(values))

        for i in range(curr_level, 0, -1):
            out_file.write('\n' + (indent_ch * (level + i - 1)) + '}')

    out_file.close()

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('-s', '--src', default=DEFAULT_CONFIG_FILE, help="config file")
    parser.add_argument('dest', help="file where the parsed JSON will be written to")
    args = parser.parse_args()

    parseConfig(args.src, args.dest)

附加说明

似乎有一个用Java编写的CS：GO配置解析器，它使用Antlr语法来解析文件。

GitHub项目链接：https://github.com/valx76/CSGO-Config-Parser

Answer 2

这是一个基于pyparsing的解析器，它将解析这种格式：

from pyparsing import Suppress, QuotedString, Forward, Group, Dict, ZeroOrMore

LBRACE,RBRACE = map(Suppress, "{}")
qs = QuotedString('"')

# forward-declare value, since this expression will be recursive
# (will contain expressions which use contain value's)
value = Forward()

key_value = Group(qs + value)
struct = LBRACE + Dict(ZeroOrMore(key_value)) + RBRACE

# define content of value using <<= operator
value <<= qs | struct

# define top-level parser
parser = Dict(key_value)

将配置加载到字符串中，然后调用parser.parseString()：

sample = open('cs_go_sample.txt').read()
config = parser.parseString(sample)

print config.keys()
for k in config.items_game.keys():
    print '-', k

config.items_game.pprint()

打印：

['items_game']
- sticker_kits
- client_loot_lists
- prefabs
- quest_definitions
- alternate_icons2
- music_definitions
- rarities
- colors
- campaign_definitions
- player_loadout_slots
- quest_schedule
- item_levels
- revolving_loot_lists
- game_info
- pro_players
- recipes
- items_game_live
- paint_kits_rarity
- paint_kits
- qualities
- items
- attributes
- item_sets
- quest_reward_loot_lists
- kill_eater_score_types

[['game_info',
  ['first_valid_class', '2'],
  ['last_valid_class', '3'],
  ['first_valid_item_slot', '0'],
  ['last_valid_item_slot', '54'],
  ['num_item_presets', '4']],
 ['rarities',
  ['default',
   ['value', '0'],
... etc. ...

修改

如果希望在分析时将整数值转换为整数，则可以定义解析操作来执行此操作。但是你想把这个（我认为）只附加到作为值的引用字符串，而不是那些作为键的字符串。

# use this code to convert integer values to ints at parse time key_qs = qs.copy() value_qs = qs.copy() def convert_integers(tokens): if tokens[0].isdigit(): tokens[0] = int(tokens[0]) value_qs.setParseAction(convert_integers) value = Forward() key_value = Group(key_qs + value) struct = LBRACE + Dict(ZeroOrMore(key_value)) + RBRACE value <<= value_qs | struct parser = Dict(key_value)

现在输出值如下：

[['game_info', ['first_valid_class', 2], ['last_valid_class', 3], ['first_valid_item_slot', 0], ['last_valid_item_slot', 54], ['num_item_presets', 4]], ['rarities', ['default', ['value', 0], ['loc_key', 'Rarity_Default'], ['loc_key_weapon', 'Rarity_Default_Weapon'],

请注意，整数值不再显示为字符串，而是显示为实际的Python整数。

在Python

2 个答案:

附加说明