Question

我正在锻炼：

编写一个脚本以读取文件并返回一个数组，该数组包含所有可能包含1900至2020年之间的日期的字段的索引。例如，
Although solar eclipses (Alpha et al. 1980) might be granular (Bethe & Gamow 2000), it is thought...
应产生一个数组[6, 13]。

我的想法：有一个函数np.argwhere，它接受一个数组并返回其值为true的索引，但是这些是嵌入在字符串中的整数，因此我不知道如何使用它。

我正在使用以下代码，但由于年份与括号项相关联，因此无法正常工作。

import numpy as np
a = np.loadtxt("exercise.txt", str)
test = np.arange(1900,2021)
test = np.asarray(1900,2021, str)
print(test)
print(a)
mask = np.isin(a, test)
print(np.argwhere(mask == True))

Answer 1

In [25]: a = 'Although solar eclipses (Alpha et al. 1980) might be granular (Bethe & Gamow 2000)'
In [26]: b = [i for i, aa in enumerate(a.split()) if aa.strip(')').isnumeric()]
In [27]: b = [i for i in b if 1980 <= int(a.split()[i].strip(')')) <= 2020]
In [28]: b
Out[28]: [6, 13]

Answer 2

对于 num py来说，这实际上不是问题。

import re

def get_indices(s):
    fields = s.split(' ')
    matches = (re.match(r'[^\d]*(\d{4})(?!\d)', x) for x in fields)
    years = ((i, int(m.group(1))) for i, m in enumerate(matches) if m is not None)
    return [i for i, x in years if 1900 <= x <= 2020]

with open('exercise.txt') as f:
    for line in f:
        print(get_indices(line))

例如：

>>> get_indices('Although solar eclipses (Alpha et al. 1980) '
                'might be granular (Bethe & Gamow 2000)')
[6, 13]

在文本数组中查找多年的索引

2 个答案: