Question

当我尝试读取文件并将其存储在列表中时，它无法存储单个引号内的字符串作为列表中的单个值。

示例文件：

12 3 'dsf dsf'

列表应包含

listname = [12, 3, 'dsf dsf']

我能够像下面这样做：

listname = [12, 3, 'dsf', 'dsf']

请帮忙

Answer 1

使用csv模块。

演示：

>>> import csv
>>> with open('input.txt') as inp:
...     print(list(csv.reader(inp, delimiter=' ', quotechar="'"))[0])
... 
['12', '3', 'dsf dsf']

input.txt是示例中包含数据的文件。

Answer 2

您可以使用shlex模块以简单的方式分割数据。

import shlex
data = open("sample file", 'r')
print shlex.split(data.read())

尝试一下：）

Answer 3

您可以使用正则表达式：

import re
my_regex = re.compile(r"(?<=')[\w\s]+(?=')|\w+")
with open ("filename.txt") as my_file:
    my_list = my_regex.findall(my_file.read())
    print(my_list)

文件内容12 3 'dsf dsf'的输出：

['12', '3', 'dsf dsf']

RegEx说明：

(?<=')     # matches if there's a single quote *before* the matched pattern
[\w\s]+    # matches one or more alphanumeric characters and spaces
(?=')      # matches if there's a single quote *after* the matched pattern
|          # match either the pattern above or below
\w+        # matches one or more alphanumeric characters

Answer 4

您可以使用：

>>> l = ['12', '3', 'dsf', 'dsf']
>>> l[2:] = [' '.join(l[2:])]
>>> l
['12', '3', 'dsf dsf']

Answer 5

基本上，您需要解析数据。这是：

将其拆分为令牌
解释生成的序列
- 在您的情况下，每个令牌可以单独解释

第一项任务：

每个令牌是：
- 设置非空格字符，或
- 引用，然后是其他任何引用。
分隔符是单个空格（您没有指定空格/其他空白字符的运行是否有效）

解读：

引用：带上随附的文字，丢弃引号
不引用：如果可能，请转换为整数（您没有指定它是否始终是/应该是整数）
（你也没有说明它是否总是2个整数+引用的字符串 - 即是否应强制执行此组合）

由于语法非常简单，两个任务可以同时完成：

import re
i=0
maxi=len(line)
tokens=[]
re_sep=r"\s"
re_term=r"\S+"
re_quoted=r"'(?P<enclosed>[^']*)'"
re_chunk=re.compile("(?:(?P<term>%(re_term)s)"\
                     "|(?P<quoted>%(re_quoted)s))"\
                    "(?:%(re_sep)s|$)"%locals())
del re_sep,re_term,re_quoted
while i<maxi:
    m=re.match(re_chunk,line,i)
    if not m: raise ValueError("invalid syntax at char %d"%i)
    gg=m.groupdict()
    token=gg['term']
    if token:
        try: token=int(token)
        except ValueError: pass
    elif gg['quoted']:
        token=gg['enclosed']
    else: assert False,"invalid match. locals=%r"%locals()
    tokens.append(token)
    i+=m.end()
    del m,gg,token

这是一个如何手工完成的例子。但是，您可以重用任何可以处理相同语法的现有解析算法。其他答案中提到的csv和shlex就是示例。请注意，他们可能也接受其他语法，您可能想要也可能不想要。 E.g：

shlex也接受双引号和结构，例如"asd"fgh和'asd'\''fgh'
csv允许多个连续的分隔符（生成一个空元素）和'asd'fgh（剥离引号）和asd'def'（保留引号完整）

读取带有单引号数据的文件并将其存储在python中的列表中

5 个答案: