从csv文件中删除字符和重复项并写入新文件

时间:2018-10-27 06:52:56

标签: python

我正在从如下所示的csv文件中读取内容:

[152.60115606936415][152.60115606936415, 13181.818181818182][152.60115606936415, 13181.818181818182, 1375055.330634278][152.60115606936415, 13181.818181818182, 1375055.330634278, 89.06882591093118]

我要做的是删除字符([,]和空格到新行)并将其写入新的txt文件

import csv
to_file =open("t_put.txt","w")
with open("t_put_val.20181026052328.csv", "r") as f:
   for row in (list(csv.reader(f))):
   value2= (" ".join(row)[1:-1]) #remove 3 first and last elements
   value = value2.replace("  ","\n")# replace spaces with newline
   value3 = value.replace("]["," ") # replace ][
   value4 = value3.replace(" ","\n")
   print(value4)
  # st = str(s)
   to_file.write(value4)#write to file
to_file.close()

使用此代码,我可以删除字符,但仍显示重复项。我当时在考虑使用set()方法,但是它不能按预期方式工作,或者只是打印出最后四位数字,但是可能不适用于较大的数据集

4 个答案:

答案 0 :(得分:1)

通过用']'分割,您可以将csv中的每个列表分组。

# Open up the csv file
with open("t_put_val.20181026052328.csv", "r") as f_h:
    rows = [row.lstrip('[').split(", ")
            # For each line in the file (there's just one)
            for line in f_h.readlines()
            # Dont' want a blank line
            if not len(line) == 0
            # Split the line by trailing ']'s
            for row in line.split(']')
            # Don't want the last blank list
            if not len(row) == 0
            ]

# Print out all unique values
unique_values = set(item for row in rows for item in row)
[print(value) for value in unique_values];

# Output
with open("t_put.txt", 'w') as f_h:
    f_h.writelines('%s\n' % ', '.join(row) for row in rows)

答案 1 :(得分:1)

set是无序的数据结构。

一种将String输出转换为列表对象,然后使用python set()方法的更好方法,

>>> my_int = [152.60115606936415, 13181.818181818182, 152.60115606936415, 13181.818181818182, 1375055.330634278, 152.60115606936415]

您可以直接使用set列出,以便删除重复项。

>>> set(my_int)
{152.60115606936415, 13181.818181818182, 1375055.330634278}

但是,如果您不想在上方选择,而是想要列表输出,那么可以像下面这样选择...

>>> list(set(my_int))
[152.60115606936415, 13181.818181818182, 1375055.330634278]

使用collections.OrderedDict ..

根据对话,所需的输出应为有序形式,因此使用OrderedDict保留数据集的顺序。

from collections import OrderedDict
import csv
to_file =open("ttv","w")
with open("tt", "r") as f:
    for row in (list(csv.reader(f))):
         value2= (" ".join(row)[1:-1]) #remove 3 first and last elements
         value = value2.replace("  ","\n")# replace spaces with newline
         value3 = value.replace("]["," ") # replace ][
         value4 = value3.replace(" ","\n")
         value4 = OrderedDict.fromkeys(value4.split())
         #value4 = sorted(set(value4.split()))
         for line in value4:
             line = line.split(',')
             for lines in line:
                 new_val = lines
                 print(new_val)
                 to_file.write(new_val + '\n')#write to file
to_file.close()

结果:

152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118

答案 2 :(得分:1)

如果我正确地假设您只想将每个唯一值写入输出文件中的新行,这还将保留原始顺序:

from collections import OrderedDict

with open('t_put_val.20181026052328.csv', 'r') as infile, open('t_put.txt', 'w') as outfile:
data = infile.read()
# List of characters to replace
to_replace = ['[', ']', ' ']
for char in to_replace:
    if char in data:
        data = data.replace(char, '')
unique_list = list(OrderedDict.fromkeys(data.split(',')))
for i in unique_list:
    outfile.write(i + '\n')

在txt文件中添加:

152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118

答案 3 :(得分:0)

您可以按照以下给出的方式结合linux命令行使用脚本: 如果您编译脚本,答案将是:

./yourscript.py

152.60115606936415
152.60115606936415
13181.818181818182
152.60115606936415
13181.818181818182
1375055.330634278
152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118

但是,如果您在外壳中使用pipes并将输出写入文件,则可以按以下步骤轻松删除重复项:

./yourscript.py |sort|uniq > yourresultfile

如果您看到文件的结果,它将显示为

cat yourresultfile
13181.818181818182
1375055.330634278
152.60115606936415
89.06882591093118

通过这种方式,您可以从文件中删除重复项。

因此,如果您希望使用pythonic的方法来执行此操作,则以下是实现所需输出的愚蠢方法:

#!/usr/bin/python
import json
with open('input_file.txt', 'r') as myfile:
     data=myfile.read().replace('\n', '')

str1= data.replace('[','')
str2= str1.replace(']',',')
list1=str2.split(',')
list2=list(set(k))
list3=[x.strip() for x in list2 if x.strip()]
list4=[float(i) for i in list3]
with open('out_put_file.txt','w') as f:
     f.write(json.dumps(list4))

文件 out_put_file.txt 包含以下输出:

[13181.818181818182, 1375055.330634278, 89.06882591093118, 152.60115606936415]