所以我需要解析python中的值列表并对其进行单热编码以进行特征工程。以下是我的功能集的“设施”列的一个样本的值。
x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}
这里的问题是它有两个大括号'{}',以及应该用双引号但不是双引号的值(参见:上面示例中的Kitchen,Heating)。如果我可以将上面的内容转换为字符串,那么我知道如何删除大括号并将它们拆分成列表。
我需要将上面的内容转换为非双引号值的项目列表。
答案 0 :(得分:1)
输入数据看起来很糟糕。但是,最简单的方法是删除双引号,然后根据逗号分割(我已经避开了花括号部分,因为它也可以轻松删除):
s = '"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"'
print(s.replace('"','').split(","))
结果:
['Wireless Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Family/kid friendly', 'Essentials', 'Hair dryer', 'Iron', 'translation missing: en.hosting_amenity_50']
当然,如果数据中包含逗号,那么你就会干杯,因为由于缺少引号而无法区分字段中的逗号和分隔符逗号......(否则它将是轻而易举的使用ast.literal_eval
解析)
完全剥离卷曲的东西需要更多的肮脏工作,但可行:
s = 'x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}'
print(s.replace('"','').split("{")[1].rstrip('}').split(","))