无法将值列表解析为字符串列表

时间:2018-02-18 17:38:53

标签: python pandas machine-learning feature-extraction

所以我需要解析python中的值列表并对其进行单热编码以进行特征工程。以下是我的功能集的“设施”列的一个样本的值。

x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}

这里的问题是它有两个大括号'{}',以及应该用双引号但不是双引号的值(参见:上面示例中的Kitchen,Heating)。如果我可以将上面的内容转换为字符串,那么我知道如何删除大括号并将它们拆分成列表。

我需要将上面的内容转换为非双引号值的项目列表。

1 个答案:

答案 0 :(得分:1)

输入数据看起来很糟糕。但是,最简单的方法是删除双引号,然后根据逗号分割(我已经避开了花括号部分,因为它也可以轻松删除):

s = '"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"'

print(s.replace('"','').split(","))

结果:

['Wireless Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Family/kid friendly', 'Essentials', 'Hair dryer', 'Iron', 'translation missing: en.hosting_amenity_50']

当然,如果数据中包含逗号,那么你就会干杯,因为由于缺少引号而无法区分字段中的逗号和分隔符逗号......(否则它将是轻而易举的使用ast.literal_eval解析)

完全剥离卷曲的东西需要更多的肮脏工作,但可行:

s = 'x = {"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}'

print(s.replace('"','').split("{")[1].rstrip('}').split(","))