我有一个像这样的csv文件:
Date of event Name Date of birth
06.01.1986 John Smit 23.08.1996
18.12.1996 Barbara D 01.08.1965
12.12.2001 Barbara D 01.08.1965
17.10.1994 John Snow 20.07.1965
我必须通过" Name"找到唯一的行。和"出生日期" (可能与其他一些列有关)但是有MAX日期。
所以我必须得到这样的csv文件:
Date of event Name Date of birth
06.01.1986 John Smit 23.08.1996
12.12.2001 Barbara D 01.08.1965
17.10.1994 John Snow 20.07.1965
怎么做?我没有任何想法......
答案 0 :(得分:0)
import pandas as pd
# read the csv in with pandas module
df = pd.read_csv('pathToCsv.csv', header=0, parse_dates=[0, 2])
# set the column names as more programming friendly i.e. no whitespace
df.columns = ['dateOfEvent','name','DOB'] # and probably some other columns ..
# keep row only with max (Date of event) per group ( name, Date of Birth )
yourwish = =df.groupby(['Name','DOB'])['dateOfEvent'].max()
答案 1 :(得分:0)
由于您的列名称包含空格,因此最好用逗号分隔。
您可以使用pandas库执行此操作:
import tempfile
import pandas
# create a temporary csv file with your data (comma delimited)
temp_file_name = None
with tempfile.NamedTemporaryFile('w', delete=False) as f:
f.write("""Date of event,Name,Date of birth
06.01.1986,John Smit,23.08.1996
18.12.1996,Barbara D,01.08.1965
12.12.2001,Barbara D,01.08.1965
17.10.1994,John Snow,20.07.1965""")
temp_file_name = f.name
# read the csv data using the pandas library, specify columns with dates
data_frame = pandas.read_csv(
temp_file_name,
parse_dates=[0,2],
dayfirst=True,
delimiter=','
)
# use groupby and max to do the magic
unique_rows = data_frame.groupby(['Name','Date of birth']).max()
# write the results
result_csv_file_name = None
with tempfile.NamedTemporaryFile('w', delete=False) as f:
result_csv_file_name = f.name
unique_rows.to_csv(f)
# read and show the results
with open(result_csv_file_name, 'r') as f:
print(f.read())
这导致:
Name,Date of birth,Date of event
Barbara D,1965-08-01,2001-12-12
John Smit,1996-08-23,1986-01-06
John Snow,1965-07-20,1994-10-17