我有一个包含用户信息的csv文件。该文件的示例如下。
"userType": "NORMAL", "accountID": "J123456789"
"userType": "NORMAL", "accountID": "J987654321"
"userType": "NORMAL", "accountID": "C123456789"
"userType": "NORMAL", "accountID": "R987654321"
我想在python 3中使用正则表达式获取ID号。
我使用的正则表达式为("accountID": ")\w+
,它会产生以下结果。
"accountID": "J123456789
"accountID": "J987654321
"accountID": "C123456789
"accountID": "R987654321
所需的输出应如下所示
J987654321
J987654321
C123456789
R987654321
答案 0 :(得分:0)
如果文件格式是固定的,请考虑自动检测方言:
import csv
with open('test.csv') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
accounts = [row[2] for row in reader]
此代码将产生以下列表:
accounts
['J000025574', 'J000025620', 'C000025623', 'R000025624']
答案 1 :(得分:0)
您可以使用下面的正则表达式"(?:\"accountID\": \")(\S+)\"
,该正则表达式仅使ID无效,而忽略其余部分
import re
s = """"userType": "NORMAL", "accountID": "J123456789"
"userType": "NORMAL", "accountID": "J987654321"
"userType": "NORMAL", "accountID": "C123456789"
"userType": "NORMAL", "accountID": "R987654321" """
print(re.findall("(?:\"accountID\": \")(\S+)\"",s))
结果:
['J123456789', 'J987654321', 'C123456789', 'R987654321']
答案 2 :(得分:0)
Imho,这根本不需要任何导入:
with open('test.csv') as f:
for line in f:
print(line.strip()[-11:-1])
或者如果帐户ID的长度确实有所不同,请使用:
print(line.split('"')[-2])
在循环之内。
答案 3 :(得分:0)
您可以为自己编写一个解析器(尽管可能有点高):
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
text = """
"userType": "NORMAL", "accountID": "J123456789"
"userType": "NORMAL", "accountID": "J987654321"
"userType": "NORMAL", "accountID": "C123456789"
"userType": "NORMAL", "accountID": "R987654321"
"""
grammar = Grammar(
r"""
file = entry+
entry = garbage? (pair)+ newline
pair = ws? key equal value comma?
key = quotes word quotes
value = quotes word quotes
quotes = '"'
word = ~"\w+"
equal = ws? ":" ws?
comma = ws? "," ws?
ws = ~"[\t ]+"
newline = ~"[\r\n]"
garbage = (ws / newline)+
"""
)
tree = grammar.parse(text)
class Vistor(NodeVisitor):
def __init__(self, needle):
self.needle = needle
def generic_visit(self, node, visited_children):
return visited_children or node
def visit_key(self, node, children):
_, key, _ = children
return key
def visit_value(self, node, children):
_, value, _ = children
return value
def visit_pair(self, node, children):
_, key, _, value, _ = children
return (key, value)
def visit_entry(self, node, children):
_, entry, _ = children
return entry
def visit_file(self, node, children):
out = [value.text
for child in children if isinstance(child, list)
for key, value in child
if key.text == self.needle]
return out
v = Vistor("accountID")
out = v.visit(tree)
print(out)
哪个产量
['J123456789', 'J987654321', 'C123456789', 'R987654321']