作为输入,我有一个.csv文件,如:
user, withdraw, date
50D8BF0DA22D6C914777D8F59DAAB4D8, -125, 01-02-2015
674BCF0CD236621E5680073334A73C32, -5, 01-02-2015
E17E1691D35FB2FB675E3B787B8BEDF1, -845, 01-02-2015
50D8BF0DA22D6C914777D8F59DAAB4D8, -250, 01-02-2015
674BCF0CD236621E5680073334A73C32, -98, 01-02-2015
50D8BF0DA22D6C914777D8F59DAAB4D8, -17, 01-02-2015
我想识别所有类似的哈希'代码并更改标签,例如' user1',' user2',' user3' ...等等。
我一直试图用 pandas 取得成功。知道我能做什么吗?
答案 0 :(得分:4)
首先将CSV读入Pandas DF:
df = pd.read_csv('/path/to/file.csv', skipinitialspace=True)
的产率:
In [84]: df
Out[84]:
user withdraw date
0 50D8BF0DA22D6C914777D8F59DAAB4D8 -125 01-02-2015
1 674BCF0CD236621E5680073334A73C32 -5 01-02-2015
2 E17E1691D35FB2FB675E3B787B8BEDF1 -845 01-02-2015
3 50D8BF0DA22D6C914777D8F59DAAB4D8 -250 01-02-2015
4 674BCF0CD236621E5680073334A73C32 -98 01-02-2015
5 50D8BF0DA22D6C914777D8F59DAAB4D8 -17 01-02-2015
现在我们可以分解user
列:
In [85]: df['user'] = 'user' + pd.Series((pd.factorize(df.user)[0]+1).astype(str))
In [86]: df
Out[86]:
user withdraw date
0 user1 -125 01-02-2015
1 user2 -5 01-02-2015
2 user3 -845 01-02-2015
3 user1 -250 01-02-2015
4 user2 -98 01-02-2015
5 user1 -17 01-02-2015
并将DF写回csv:
df.to_csv('/path/to/file_new.csv', index=False)
答案 1 :(得分:3)
您需要首先构建用户词典,如下所示:
import csv
hashes = {}
user_number = 1
entries = []
with open('input.csv', 'rb') as f_input:
csv_input = csv.reader(f_input, skipinitialspace=True)
header = next(csv_input)
for row in csv_input:
user = row[0]
if user not in hashes:
hashes[user] = "user{}".format(user_number)
user_number += 1
row[0] = hashes[user]
entries.append(row)
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
csv_output.writerows(entries)
给你一个output.csv
包含:
user,withdraw,date
user1,-125,01-02-2015
user2,-5,01-02-2015
user3,-845,01-02-2015
user1,-250,01-02-2015
user2,-98,01-02-2015
user1,-17,01-02-2015