Question

我有一个文本文件，如下所示：

Input
3 A 4 4.2
4 B 5 3.2
5 C 4 4.0
5 D 4 8.0
........

唯一感兴趣的列是0,1和3。下面的脚本执行以下操作：如果第1列中的值与特定字母匹配，则将第3列乘以一个常数。我现在想遍历第0列，如果有重复的整数，则将第3列中的值加在一起（即，第0列中有两个5，因此，我将加1.2和2.4（与常数乘以4.0 x 0.3的值） = 1.2））。

Output
3 A 4 3.4
4 B 5 3.2
5 C 4 3.6 (the entries in column 1 and 2 don't really matter after addition)
........

我认为这在大熊猫中很容易，但是我已经写了一些词典，使过程变得复杂：

import numpy as np

ring_dict = dict()
answer = []
ring = open('data.txt', "r")

for line in ring:
     f2 = line.split(" ")
     key2 = int(f2[0])
     value2 = float(f2[3])
     name = f2[1]
     ring_dict[key2] = [name, value2]
         if name == 'A':
             answer = value2 * 0.81
         elif name == 'B':
             answer = value2 * 1
         else:
             answer = value2 * 0.3

我不确定该如何进行。我无法遍历key2（即key2中的x）来查找重复-因此我不确定如何检查重复项。同样，如果我将key2放入数组中，则字典将不起作用。

Answer 1

您可以使用以下命令检查SELECT CASE WHEN h.extra_date_08 is not NULL THEN h.extra_date_08 WHEN h.extra_date_07 is not NULL THEN h.extra_date_07 WHEN h.extra_date_06 is not NULL THEN h.extra_date_06 WHEN h.extra_date_05 is not NULL THEN h.extra_date_05 ELSE DATEADD(yyyy,5,CAST(A.tency_st_dt AS DATE)) END 'End Date',是否存在并对其进行初始化或递增

ring_dict[key2[i]]

Answer 2

我仍然不确定要什么（尤其是第1列和第2列），但是正如您提到的，使用pandas时，问题变得更加琐碎了：

import pandas as pd

# read the csv into a pd.DataFrame
df = pd.read_csv('data.txt', sep=' ', header=None)

# Multiply the column[3] by the given constant (default to 0.3 if not 'A' or 'B')
df[3] = df.apply(lambda x: round(x[3] * {'A': 0.81, 'B': 1}.get(x[1], 0.3),1), axis=1)

# Group the DataFrame by column[0] and return a new DataFrame with the sum; drop column[2].
df = df.groupby(0).agg(['sum']).drop(columns=2).reset_index(col_level=0)

# Drop the multi-index returned by the agg() method
df.columns = newdf.columns.droplevel(1)

#    0   1    3
# 0  3   A  3.4
# 1  4   B  3.2
# 2  5  CD  3.6

这使您拥有更大的自由来操纵数据。但是，如果您仍然需要dict格式的格式，可以执行以下操作：

my_dict = {v[0]: [v[1],v[2]] for v in newdf.to_dict('list').values()}

# {3: [4, 5], 'A': ['B', 'CD'], 3.4: [3.2, 3.5999999999999996]}

# Note: rounding issue on the last part, but that's easier to smooth out.

如有必要，我将使用dict变体来更新此答案。

检查列中的重复值

2 个答案: