Question

我有一个文本文件（file.txt）：

import java.util.HashSet;
import java.util.Set;

public class ArraySetTest {

    public static void main(String[] args) {

        Set<String> stringSet = new HashSet<>();
        stringSet.add("Foo1");
        stringSet.add("Foo2");

        ArraySet<String> arraySet = new ArraySet<>();
        arraySet.add("Foo1");
        arraySet.add("Foo2");

        System.out.println(arraySet.equals(stringSet));

    }

}

我希望提取和总结与'a'（或'b'或'c'或...）配对的值，以便：

instance.getElement(row,column)

注意：文本文件显然不是列表，即使它看起来像。

(A->[a:5,a:5,a:5,b:50,c:10,c:10])
(B->[e:120,g:50])
(C->[a:5,f:20])

这大致是我必须阅读文件的代码并检查每行的“a”（例如）。

谢谢。

Answer 1

首先，使用正则表达式解析夫妻，然后将它们全部提取出来。

然后使用nice itertools.groupby使用键作为a,b,c...字母（正则表达式元组的第一项）收集值。

最后，创建具有变量，值为和的整数

的元组

import re,itertools

with open("file.txt", "r") as myFile:

    r = re.compile("(\w+):(-?\d+)")

    for l in myFile:
        tuples = r.findall(l)
        sums = []
        for variable,values in itertools.groupby(tuples,lambda t: t[0]):
            sums.append((variable,sum(int(x[1]) for x in values)))
        print(l,sums)

输出：

(A->[a:5,a:5,a:5,b:50,c:10,c:10]) [('a', 15), ('b', 50), ('c', 20)]
(B->[e:120,g:50]) [('e', 120), ('g', 50)]
(C->[a:5,f:20]) [('a', 5), ('f', 20)]

如果你想要所有行的总和，小的变化。首先在列表中累积所有元组（源代码行并不重要），然后在排序列表上应用groupby（或者分组胜利不能正常工作）

import re,itertools

with open("file.txt", "r") as myFile:

  r = re.compile("(\w+):(-?\d+)")

  tuples = []
  for l in myFile:
      tuples += r.findall(l)

  sums = []
  for variable,values in itertools.groupby(sorted(tuples),lambda t: t[0]):
      sums.append((variable,sum(int(x[1]) for x in values)))
  print(sums)

结果：

[('a', 20), ('b', 50), ('c', 20), ('e', 120), ('f', 20), ('g', 50)]

Answer 2

def find(s, ch):
    return [i for i, ltr in enumerate(s) if ltr == ch]

myFile = open("file.txt", "r")
content = myFile.read()
totalValue = 0

all_colon_indexes = find(content,':')

for i in range(0,len(content)):
    if content[i]==':':
        if content[i-1]=='a':  #THIS IS WHERE YOU SPECIFY 'a' or 'b' or 'c', etc
            value=''
            index = i+1
            while True:
                if content[index].isdigit()==True:
                    value=value+content[index]
                    index=index+1
                else:
                    break
            _value = int(value)
            totalValue = totalValue + _value

print totalValue

结果：

Answer 3

使用正则表达式解析文件：

\w代表单词字符
\d代表数字
+指定您要匹配前面一个或多个匹配组
?指定您要匹配前一个匹配组中的零个或一个（以说明减号）
括号指定它们内部匹配的内容应该被提取为一组字符，因此我们有两个组（一个用于字母，一个用于数字）

然后使用defaultdict来保存名称 - ＆gt;和映射。 defaultdict就像一个普通的dict，但是当缺少密钥时，它会创建一个默认值，通过调用创建时提供的callable来获得。在这种情况下，这是int，在调用时返回0。

import re
from collections import defaultdict

value_pattern = re.compile("(\w+):(-?\d+)")
totals = defaultdict(int)

with open("file.txt", "r") as myFile:
    for line in myFile.readlines():
        values = value_pattern.findall(line)
        for name, value in values:
            totals[name] += int(value)

        print(totals.items())
        totals.clear()

这给出了

dict_items([('c', 20), ('a', 15), ('b', 50)])
dict_items([('g', 50), ('e', 120)])
dict_items([('f', 20), ('a', 5)])

在您的文件上运行时。

Answer 4

如果我可以建议一个更紧凑的解决方案，总结每个＆＃34;键＆＃34;在文本文件中输出字典：

import re
from collections import defaultdict

with open('a.txt') as f:
    lines = f.read()

tups = re.findall(r'(\w+):(\d+)', lines)
print(tups)
# tups is a list of tuples in the form (key, value), ie [('a': '5'), ...]

sums = defaultdict(int)
for tup in tups:
    sums[tup[0]] += int(tup[1])

print(sums)

将输出：

[('a', '5'), ('a', '5'), ('a', '5'), ('b', '50'), ('c', '10'), ('c', '10'), ('e', '120'), ('g', '50'), ('a', '5'), ('f', '20')]
defaultdict(<class 'int'>, {'f': 20, 'b': 50, 'e': 120, 'a': 20, 'c': 20, 'g': 50})

更具体地说：

print(sums['a'])
>> 20
print(sums['b'])
>> 50

Answer 5

无意踩到Jean-Francois的脚趾:-) - 我建议使用 Counter 来计算。

mwe('dir', 'postpfile', 150, 90.)

结果：import collections with open("file.txt", "r") as myFile: r = re.compile("(\w+):(-?\d+)") res = collections.Counter() for l in myFile: for key, cnt in r.findall(l): res.update({key: int(cnt)})现在是：

res

您可以像字典一样访问它：例如：

Counter({'e': 120, 'b': 50, 'g': 50, 'c': 20, 'f': 20, 'a': 20})

Python：使用字符串的特定部分（看起来像列表）

5 个答案: