我正在从如下所示的csv文件中读取内容:
[152.60115606936415][152.60115606936415, 13181.818181818182][152.60115606936415, 13181.818181818182, 1375055.330634278][152.60115606936415, 13181.818181818182, 1375055.330634278, 89.06882591093118]
我要做的是删除字符([,]和空格到新行)并将其写入新的txt文件
import csv
to_file =open("t_put.txt","w")
with open("t_put_val.20181026052328.csv", "r") as f:
for row in (list(csv.reader(f))):
value2= (" ".join(row)[1:-1]) #remove 3 first and last elements
value = value2.replace(" ","\n")# replace spaces with newline
value3 = value.replace("]["," ") # replace ][
value4 = value3.replace(" ","\n")
print(value4)
# st = str(s)
to_file.write(value4)#write to file
to_file.close()
使用此代码,我可以删除字符,但仍显示重复项。我当时在考虑使用set()方法,但是它不能按预期方式工作,或者只是打印出最后四位数字,但是可能不适用于较大的数据集
答案 0 :(得分:1)
通过用']'分割,您可以将csv中的每个列表分组。
# Open up the csv file
with open("t_put_val.20181026052328.csv", "r") as f_h:
rows = [row.lstrip('[').split(", ")
# For each line in the file (there's just one)
for line in f_h.readlines()
# Dont' want a blank line
if not len(line) == 0
# Split the line by trailing ']'s
for row in line.split(']')
# Don't want the last blank list
if not len(row) == 0
]
# Print out all unique values
unique_values = set(item for row in rows for item in row)
[print(value) for value in unique_values];
# Output
with open("t_put.txt", 'w') as f_h:
f_h.writelines('%s\n' % ', '.join(row) for row in rows)
答案 1 :(得分:1)
set
是无序的数据结构。
一种将String输出转换为列表对象,然后使用python set()
方法的更好方法,
>>> my_int = [152.60115606936415, 13181.818181818182, 152.60115606936415, 13181.818181818182, 1375055.330634278, 152.60115606936415]
您可以直接使用set列出,以便删除重复项。
>>> set(my_int)
{152.60115606936415, 13181.818181818182, 1375055.330634278}
但是,如果您不想在上方选择,而是想要列表输出,那么可以像下面这样选择...
>>> list(set(my_int))
[152.60115606936415, 13181.818181818182, 1375055.330634278]
collections.OrderedDict
.. 根据对话,所需的输出应为有序形式,因此使用OrderedDict
保留数据集的顺序。
from collections import OrderedDict
import csv
to_file =open("ttv","w")
with open("tt", "r") as f:
for row in (list(csv.reader(f))):
value2= (" ".join(row)[1:-1]) #remove 3 first and last elements
value = value2.replace(" ","\n")# replace spaces with newline
value3 = value.replace("]["," ") # replace ][
value4 = value3.replace(" ","\n")
value4 = OrderedDict.fromkeys(value4.split())
#value4 = sorted(set(value4.split()))
for line in value4:
line = line.split(',')
for lines in line:
new_val = lines
print(new_val)
to_file.write(new_val + '\n')#write to file
to_file.close()
结果:
152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118
答案 2 :(得分:1)
如果我正确地假设您只想将每个唯一值写入输出文件中的新行,这还将保留原始顺序:
from collections import OrderedDict
with open('t_put_val.20181026052328.csv', 'r') as infile, open('t_put.txt', 'w') as outfile:
data = infile.read()
# List of characters to replace
to_replace = ['[', ']', ' ']
for char in to_replace:
if char in data:
data = data.replace(char, '')
unique_list = list(OrderedDict.fromkeys(data.split(',')))
for i in unique_list:
outfile.write(i + '\n')
在txt文件中添加:
152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118
答案 3 :(得分:0)
您可以按照以下给出的方式结合linux命令行使用脚本: 如果您编译脚本,答案将是:
./yourscript.py
152.60115606936415
152.60115606936415
13181.818181818182
152.60115606936415
13181.818181818182
1375055.330634278
152.60115606936415
13181.818181818182
1375055.330634278
89.06882591093118
但是,如果您在外壳中使用pipes并将输出写入文件,则可以按以下步骤轻松删除重复项:
./yourscript.py |sort|uniq > yourresultfile
如果您看到文件的结果,它将显示为
cat yourresultfile
13181.818181818182
1375055.330634278
152.60115606936415
89.06882591093118
通过这种方式,您可以从文件中删除重复项。
因此,如果您希望使用pythonic的方法来执行此操作,则以下是实现所需输出的愚蠢方法:
#!/usr/bin/python
import json
with open('input_file.txt', 'r') as myfile:
data=myfile.read().replace('\n', '')
str1= data.replace('[','')
str2= str1.replace(']',',')
list1=str2.split(',')
list2=list(set(k))
list3=[x.strip() for x in list2 if x.strip()]
list4=[float(i) for i in list3]
with open('out_put_file.txt','w') as f:
f.write(json.dumps(list4))
文件 out_put_file.txt 包含以下输出:
[13181.818181818182, 1375055.330634278, 89.06882591093118, 152.60115606936415]