Question

我有一个Python脚本，它遍历PDF文件（在每个页面上循环），并且在每个页面内进行一些文本操作。所以基本上有两个循环：

files = {}

#npages is the number of PDF pages in the specific file.

for n in range(npages):

    path = pdf_name + str(n + 1) + '_1.txt'

    files[int(n)] = path

    for i, col in enumerate(COLUMNS):

        path = pdf_name + str(n + 1) + '_' + str(i + 2) + '.txt'
        files[int(n)][int(i)] = path

基本上，我查看每个 PDF页面，然后在每个页面上进一步进行一些文本操作。

我正在尝试将其输出为：

- file_page_1.pdf
  - file_page_1_col_1.pdf
  - file_page_1_col_2.pdf
file_page_2.pdf
  - file_page_2_col_1.pdf
  - file_page_2_col_2.pdf

但是，使用上述命令会给我以下错误：

files[int(n)][int(i)] = path
TypeError: 'str' object does not support item assignment

Answer 1

我认为您要查找的结构是一个字典，其中包含用于列出值的字符串键。

files = {}

for page in range(npages):
    path = pdf_name + str(n+1) + '_1.txt'
    files[path] = []
    for i, col in enumerate(COLUMNS):
        subpath = pdf_name + str(n + 1) + '_' + str(i + 2) + '.txt'
        files[path].append(subpath)

# For accessing items
for path, subpaths in files.items():
    # path is a string, the key in files dict
    print(path) 
    # subpaths is a list of strings, the value in files dict
    for subpath in subpaths:
        print(subpath)

如果希望按插入顺序返回路径/子路径对，则可以使用OrderedDict而不是dict。

from collections import OrderedDict
files = OrderedDict()
# code as above

Answer 2

这是因为files[int(n)]返回了str，而不是字典。

从行中可以看到。

files[int(n)] = path

您正在尝试从str对象实现字典行为。进行您想做的事情，我们可以做类似的事情。

from collections import defaultdict

files = {}
for n in range(npages):
    path = pdf_name + str(n + 1) + '_1.txt'
    files[int(n)] = defaultdict()
    files[int(n)]['path_root'] = path

    for i, col in enumerate(COLUMNS):
        path = pdf_name + str(n + 1) + '_' + str(i + 2) + '.txt'
        files[int(n)][int(i)] = path

这应该给您带来如下结果：

|-- nth file 
|    |
|    |- path_root
|    |- child1 (0)
|    |- child2 (1)
..

关于defaultdict的简短说明：

somedict = {}
print(somedict[3]) # KeyError

someddict = defaultdict(int) # or str
print(someddict[3]) # print int(), thus 0 (str will return you '')

追加到关联数组

2 个答案: