使用Python Pandas从CSV文件中删除重复记录

时间:2018-10-03 18:47:20

标签: python pandas csv grouping distinct-values

我想使用Python Pandas从csv文件中删除重复记录 CSV包含具有三个属性scale(最小缩放,最大缩放)的记录。我想要一个具有minzoom和maxzoom的结果数据框,并且记录保持唯一性

输入CSV文件(lookup_scales.csv)

 Scale, minzoom, maxzoom
 2000, 0, 15
 3000, 0, 15
 10000, 8, 15
 20000, 8, 15
 200000, 15, 18
 250000, 15, 18

必需的distinct_lookup_scales.csv(无比例列)

minzoom, maxzoom
0,5
8,15
15,18 

到目前为止,我的代码是

lookup_scales_df = pd.read_csv('C:/Marine/lookup/lookup_scales.csv', names = ['minzoom','maxzoom'])
lookup_scales_df = lookup_scales_df.set_index([2, 3])
file_name = "C:/Marine/lookup/distinct_lookup_scales.csv"
lookup_scales_df.groupby('minzoom', 'maxzoom').to_csv(file_name, sep=',')

非常感谢您的帮助。我是熊猫新手,正在使用数据框

3 个答案:

答案 0 :(得分:2)

在使用熊猫导入csv时,您不需要numpy或只需要一行即可完成unique-ify的任何事情:

import pandas as pd
df = pd.read_csv('lookup_scales.csv', usecols=['minzoom', 'maxzoom']).drop_duplicates(keep='first').reset_index()

输出:

   minzoom  maxzoom
0        0       15
1        8       15
2       15       18

然后将其写到csv:

df.to_csv(file_name, index=False) # you don't need to set sep in this because to_csv makes it comma delimited.

整个代码:

import pandas as pd
df = pd.read_csv('lookup_scales.csv', usecols=['minzoom', 'maxzoom']).drop_duplicates(keep='first').reset_index()
file_name = "C:/Marine/lookup/distinct_lookup_scales.csv"
df.to_csv(file_name, index=False) # you don't need to set sep in this because to_csv makes it comma delimited.

答案 1 :(得分:1)

您可以使用pd.read_csv()pd.to_csv()drop_duplicates()

import pandas as pd

df = pd.read_csv('test.csv', sep=', ', engine='python')

new_df = df[['minzoom','maxzoom']].drop_duplicates()

new_df.to_csv('out.csv', index=False)

输出到out.csv

minzoom,maxzoom
0,15
8,15
15,18

在阅读sep=', '时请注意test.csv,否则,如果保留默认的sep=',',则您的列名将带有前导空格。

答案 2 :(得分:0)

d_kennetz提供的

答案完全错误。在保持其他列完好无损的情况下,正确的方法是替换h

#df = pd.read_csv('yourcsvfilehere.csv').drop_duplicates('columnnamehere',keep='first')