我想解析下面列表中的字符串,(因为它的类型为sales.drop(sales.CustomerID.isin(badcu))
而调用它是字符串)并从其dict元素中获取一些信息:
str
我使用ast packege和literal_eval转换为列表并对其进行解析。但是以 "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"
错误计数。
以下是相同的代码:
ValueError: malformed string
这里第[18]行是上面的字符串。
或者如果它包含任何空值,我怎么能忽略这样的列表标志字符串,就像它一样。
PS:line [18]是我想读的csv的列号。
答案 0 :(得分:1)
好的,就开始说:哇,这比我想象的要难!
字符串有两个问题:
null
类型,因此我们需要将其更改为None
。 所以这是代码:
import re
import ast
data_in = "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"
# Make a copy for modification.
formatted_data = data_in
# Captures the positional information of adding and removing characters.
offset = 0
# Finds all key and values.
p = re.compile("[\{\:,]([\w\s\d]{2,})")
for m in p.finditer(data_in):
# Counts the number of characters removed via strip().
strip_val = len(m.group(1)) - len(m.group(1).strip())
# Adds in quotes for a single match.
formatted_data = formatted_data[:m.start(1)+offset] + "\"" + m.group(1).strip() + "\"" + formatted_data[m.end(1)+offset:]
# Offset will always add 2 ("+name+"), minus whitespace removed.
offset += 2 - strip_val
company_list = ast.literal_eval(formatted_data)
# Finds 'null' values and replaces them with None.
for item in company_list:
for k,v in item.iteritems():
if v == 'null':
item[k] = None
print company_list
它是用Python 3编写的,我将我记得的位改为2,可能会出现小错误。
结果是list
个dict
个对象:
[{'isin': 'US51817R1068', 'name': 'LATAM Airlines Group SA'}, {'isin': 'CL0000000423', 'name': 'LATAM Airlines Group SA'}, {'isin': None, 'name': 'LATAM Airlines Group SA'}, {'isin': 'BRLATMBDR001', 'name': 'LATAM Airlines Group SA'}]
有关正则表达式的详细信息,请参阅here。