我在唯一的CSV文件中有大量简单的时间序列。每个文件都包含“日期”列和“关闭”列。
我想使用pandas将每个文件的数据读入数据框,在“关闭”列中找到最小值,并将最小“关闭”值和关联的“日期”写入新数据框
对于筛选的所有文件,理想情况下,这将生成一个新的数据框,其中包含最小“关闭”值以及发生该最小值的日期。
import pandas as pd
import os
symbol = "LN"
start_year = 2010
end_year = 2014
months = ["G", "J", "M", "N", "Q", "V", "Z"]
def historiclows():
df1 = pd.read_csv("%s.csv" % (file3))
df1 = df1.drop(df1.columns[[1,2,3,5,6]], axis = 1)
targetvalues = df1.loc[df1["Close"].idxmin()]
df2.append(targetvalues)
for m in months:
df2 = pd.DataFrame()
for y in range(start_year, end_year+1):
if m != "Z":
if months[months.index(m)+1] != "Z":
file1 = ("%s%s%s%s%s%s" % (symbol, m, y, symbol, months[months.index(m)+1], y))
file2 = ("%s%s%s%s%s%s" % (symbol, months[months.index(m)+1], y, symbol, months[months.index(m)+2], y))
file3 = ("%s%s" % (file1, file2))
checkfile3 = os.path.isfile("%s.csv" % file3)
if checkfile3 == True:
title = ("%s%s%s" % (m, months[months.index(m)+1], months[months.index(m)+2]))
historiclows()
print(df2)
else:
pass
else:
file1 = ("%s%s%s%s%s%s" % (symbol, m, y, symbol, months[months.index(m)+1], y))
file2 = ("%s%s%s%s%s%s" % (symbol, months[months.index(m)+1], y, symbol, str(months[0]), y+1))
file3 = ("%s%s" % (file1, file2))
checkfile3 = os.path.isfile("%s.csv" % file3)
if checkfile3 == True:
title = ("%s%s%s" % (m, months[months.index(m)+1], str(months[0])))
historiclows()
print(df2)
else:
pass
else:
file1 = ("%s%s%s%s%s%s" % (symbol, m, y, symbol, str(months[0]), y+1))
file2 = ("%s%s%s%s%s%s" % (symbol, str(months[0]), y+1, symbol, str(months[1]), y+1))
file3 = ("%s%s" % (file1, file2))
checkfile3 = os.path.isfile("%s.csv" % file3)
if checkfile3 == True:
title = ("%s%s%s" % (m, str(months[0]), str(months[1])))
historiclows()
print(df2)
else:
pass
print(“!!! PROCESS COMPLETE !!!”)
答案 0 :(得分:3)
您可以这样做:
>> orig_df
Close
2015-01-01 4
2015-02-01 1
2015-03-01 3
2015-03-01 1
new_df = orig_df[orig_df['Close'] == min(orig_df['Close'])]
>> new_df
Close
2015-02-01 1
2015-03-01 1
然后,如果您只希望最小值在新数据框中显示一次,则可以使用drop_duplicates
:
new_df.drop_duplicates(subset=['Close'], inplace=True)
>> Close
2015-02-01 1
如果您想要最后一个日期而不是第一个日期,请执行
new_df.drop_duplicates(subset=['Close'], inplace=True, take_last=True)