Question

我想从文档中构建两个列表，这些列表的格式可能会有所不同，但应该大致为两列，并带有一些分隔符。每一行是：

plot

例如，

。我的列表应该是“list_of_word1”，“list_of_word2”。我想立刻建立它们。我知道我可以使用pandas，但由于某种原因（脚本应该能够在没有特定导入的情况下工作，仅在通用库上），我还需要使用常规文档打开。

我的尝试是：

"word1"\t"word2"

生成器不起任何作用，因为extend返回None，因此使用不会在那里重复使用的表单或者首先可能不需要的表单可能会被视为不好。另外，我想知道如何避免必须重用split函数，每行“ok”2次，但如果我在更多列上使用相同的原则，它将变得非常低效。

我尝试避免重用拆分就是这样：

list_of_word1=[]
list_of_word2=[]
((list_of_word1.extend(line.split()[0]),list_of_word2.extend(line.split()[1])) for line in open(doc))

但确实不起作用，因为它没有找到解包的元组。我也曾尝试过星号拆包，但那不起作用。

((list_of_word1.extend(linesplit0),list_of_word2.extend(linesplit1)) for line in open(doc) for (linesplit0,linesplit1) in line.split("\t"))

但是那种感觉不尽如人意，太做作了。你觉得怎么样？

Answer 1

也许这个？

lists = [[] for i in range(<number_of_lists>)]
[[z[0] + [z[1]] for z in zip(lists, line.split())] for line in open(doc)]

（可能需要一些调整）

Answer 2

无论使用哪个分隔符，这个答案都会有效（前提是它有一定数量的空格！）

data = json.loads(f.read())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-21c335a72c11> in <module>()
----> 1 data = json.loads(f)

/Users/James/anaconda/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    310     if not isinstance(s, str):
    311         raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312                             s.__class__.__name__))
    313     if s.startswith(u'\ufeff'):
    314         raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",

TypeError: the JSON object must be str, not 'bytes'

例如，如果'temp.txt'是：

with open('temp.txt','r') as f:
    data = f.read().strip('\n').split('\n')

dataNoSpace = [filter(lambda a: a!= '', i.split(' ')) for i in data]
list1, list2 = [list(i) for i in zip(*dataNoSpace)]

我们得到：

word10 word20
word11    word21
word12       word22
word13  word23
word14    word24

Answer 3

您可以将zip与argument unpacking一起使用来实现此目标。

示例输入文件data.txt：

1 2 3
apple orange banana
one two three
a b c

代码：

>>> with open('data.txt') as f:
...    list(zip(*(line.split() for line in f)))
... 
[('1', 'apple', 'one', 'a'), ('2', 'orange', 'two', 'b'), ('3', 'banana', 'three', 'c')]

另见：

Answer 4

实际上起初我想使用zip，因此发电机。但我混淆了一些东西，最后加入了 list_of_word1 = [] list_of_word2 = []

这样没用。应该做的是：

list_of_word1,list_of_word2=zip(*((line.split()) for line in open(doc)))

这就像一个魅力。仍然存在根本问题，虽然我可以做我想做的事情，但仍然存在不知道该怎么做的问题如果我必须在理解中管理拆分拆包。如果你有任何想法......？

在理解中拆分列表以执行处理

4 个答案: