Question

这是我之前请求的转贴（Summing up the total based on the random number of inputs of a column）。但在这个问题中，我曾要求在不使用pandas库的情况下提供解决方案。

问题与以前一样。我需要总结一下＆＃34;值＆＃34; file1的col1的每个值的列量，并将其导出到输出文件。我是python的新手，需要为成千上万的记录做到这一点。

File1中

col1 col2              value
559 1   91987224    2400000000
559 0   91987224    100000000
558 0   91987224    100000000
557 2   87978332    500000000
557 1   59966218    2400000000
557 0   64064811    100000000

期望的输出：

col1      Sum 
559     2500000000
558     1000000000
557     3000000000

提前致谢。

P.S：由于权限问题，我无法使用pandas库。我尝试了以下代码。分享我的努力到现在：

import csv 
fin = open("File1.txt","r")
list_txid = {}
amount_tx = {}

for line in fin:
    line = line.rstrip()
    f = line.split("\t")
    txid = f[0]
    amount = int(f[3])

fin.close()
for txid in list_txid:
    amount_tx[txid] += amount
    print("{0}\t{1:d}\n".format(txid, amount_tx[txid]))

Answer 1

你能用numpy吗？如果没有，那么问题似乎是在文件的迭代过程中你没有更新值

现在，要阅读文件：

with open('File1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')

是我建议打开它的方式。作为注释，您不需要指定'r'作为模式（打开的第二个变量），因为默认情况下它被认为是如此。与'fin = open'相反，'with open'命令的作用是它在缩进后自动关闭文件。你保存了两行代码，更重要的是，如果你忘记输入fin.close（） - 毕竟它不会在代码中抛出错误 - 文件仍然被关闭

reader = csv.reader（fin，delimiter ='\ t'）基本上自动从末尾剥离空白区域，并按标签空间拆分

以下是我将如何更改整体代码

import csv
amount_tx = {}

with open('File1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')
    for f in reader:
        txid, amount = f[0], int(f[3])
        try:
            amount_tx[txid] += amount
        except KeyError:
            amount_tx[txid] = amount

with open('OutputFileName.txt','w') as w:
    for txid, amount in amount_tx.items():
        w.write('%s\t%d\n' % (txid, amount))

如果您使用的是python 2.X而不是3.X，则amount_tx.items应为amount_tx.iteritems（）

'OutputFileName.txt'应替换为要保存结果的文件的名称 open（FNAME，'w'）指定您正在写入文件而不是读取它（这通过删除/重新创建文件开始，如果您想保留文件并附加到文件，请使用'a'代替）

Answer 2

import csv 
fin = open("File1.txt","r")
list_txid = {}
for line in fin:
    line = line.rstrip()
    f = line.split()
    if('value' not in f):
      try:
        list_txid[f[0]]+=int(f[3])
      except:
        list_txid[f[0]]=int(f[3])
fin.close()
print("{0}\t{1}\n".format('col1', 'Sum'))
for k,v in list_txid.items():
    print("{0}\t{1:d}".format(k, v))

输出：

col1    Sum

559 2500000000
558 100000000
557 3000000000

Answer 3

与其他答案一样，但使用{{3}}默认为整数，如果您没有字典中的密钥，则可以求和。

from collections import defaultdict
import csv

with open('file1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')

    amount_tx = defaultdict(int)
    # Skip headers
    next(reader)
    for line in reader:
        key = line[0]
        amount_tx[key] += int(line[3])

with open('OutputFile.txt','w') as w:
    # Write new headers
    w.write("Col1   Sum\n")
    for tx_id, tx_amount in amount_tx.items():
        w.write("{0}\t{1:d}\n".format(tx_id,tx_amount))

Answer 4

可能不是最好的方法，但鉴于你不能使用熊猫：这有用。

public void unzip(String _zipFile, String _targetLocation) {

    //create target location folder if not exist 
    dirChecker(_targetLocatioan); 

    try { 
        FileInputStream fin = new FileInputStream(_zipFile);
        ZipInputStream zin = new ZipInputStream(fin);
        ZipEntry ze = null;
        while ((ze = zin.getNextEntry()) != null) {

            //create dir if required while unzipping 
            if (ze.isDirectory()) {
                dirChecker(ze.getName());
            } else { 
                FileOutputStream fout = new FileOutputStream(_targetLocation + ze.getName());
                for (int c = zin.read(); c != -1; c = zin.read()) {
                    fout.write(c);
                } 

                zin.closeEntry();
                fout.close();
            } 

        } 
        zin.close();
    } catch (Exception e) {
        System.out.println(e);
    } 
}

Answer 5

您可以在python中使用pandas库。

它具有对行进行分组并对所需列进行求和的功能。

import pandas as pd
df = pd.read_excel("File1.txt")

print df.groupby(['col1'])[["value"]].sum()

根据列的随机输入数计算总和并导出到文件

File1中

期望的输出：

5 个答案: