考虑以下格式的数据:
20180101,10
20180102,20
20180103,15
....
第一个是日期,第二个是售出多少产品,而不是将所有这些都插入数据库中,而是使用select max xxxx SQL语句找出一个周期内的最大数量是多少,是否有任何简写形式或有用的库可以达到这个目的?谢谢。
答案 0 :(得分:1)
Pandas是您想要的库。
让我给你看一个例子:
import numpy as np
import pandas as pd
# let's build a dummy dataset
index = pd.date_range(start="1/1/2015", end="31/12/2018")
df = pd.DataFrame(np.random.randint(100, size=len(index)),
columns=["sales"], index=index)
>>> df.head()
sales
2015-01-01 32
2015-01-02 0
2015-01-03 12
2015-01-04 77
2015-01-05 86
现在,假设您要每月汇总销售额:
>>> df["sales"].groupby(pd.Grouper(freq="1M")).sum()
2015-01-31 1441
2015-02-28 1164
2015-03-31 1624
2015-04-30 1629
2015-05-31 1427
[...]
或一个学期制
df["sales"].groupby(pd.Grouper(freq="6M", closed="left", label="right")).sum()
2015-06-30 8921
2015-12-31 9365
2016-06-30 9820
2016-12-31 8881
2017-06-30 8773
2017-12-31 8709
2018-06-30 9481
2018-12-31 9522
2019-06-30 51
由于某种原因Grouper
,使用六个月的频率进行装箱在31/12的销售中遇到了一些问题,并将其放入2019年的新装箱中,调查该装箱将使您知道是否有任何发现。 ..或其他人要发表评论,请
或者您想知道哪一个是最好的学期:
>>> df["sales"].groupby(pd.Grouper(freq="6M")).sum().idxmax()
Timestamp('2016-06-30 00:00:00', freq='6M')
答案 1 :(得分:1)
这可能是一个有偏见的答案,但是pandas非常适合处理此类数据。虽然您可以使用元组,列表等完成此类操作。 熊猫提供更多功能。例如:
import pandas as pd
data = [[20180101,15], [20180102,10], [20180103,12],[20180104,10]]
df = pd.DataFrame(data=data, columns=['date', 'products'])
# if your data is in csv, excel, database... whatever... you can easily pull
# df = pd.read_csv('name') || pd.read_excel() || pd.read_sql()
df
Out[2]:
date products
0 20180101 15
1 20180102 10
2 20180103 12
3 20180104 10
# It helps to use datetime format to perform operations on the data
# Operations make reference to an "index" in the dataframe
df.index = pd.to_datetime(df['date'], format="%Y%m%d") #strftime format
df
Out[3]:
date products
date
2018-01-01 20180101 15
2018-01-02 20180102 10
2018-01-03 20180103 12
2018-01-04 20180104 10
# Now we can drop that date column...
df.drop(columns='date', inplace=True)
df
Out[4]:
products
date
2018-01-01 15
2018-01-02 10
2018-01-03 12
2018-01-04 10
# Yes, there are ways to do the above in shorthand... lots of info on pandas on SO
# I want you to see the individual steps we are taking to keep simple
# Now is when the fun begins
df.rolling(2).sum() # prints a rolling 2-day sum
Out[5]:
products
date
2018-01-01 NaN
2018-01-02 25.0
2018-01-03 22.0
2018-01-04 22.0
df.rolling(3).mean() # prints a rolling 3-day average
Out[6]:
products
date
2018-01-01 NaN
2018-01-02 NaN
2018-01-03 12.333333
2018-01-04 10.666667
df.resample('W').sum() # Resamples the data so you can look on a weekly basis
Out[7]:
products
date
2018-01-07 47
df.rolling(2).max() # max number of products over a rolling two-day period
Out[9]:
products
date
2018-01-01 NaN
2018-01-02 15.0
2018-01-03 12.0
2018-01-04 12.0
答案 2 :(得分:0)
您应该使用pandas
假设您的日期列称为“日期”,并且它是日期时间dtypes:
import pandas as pd
df = pd.DataFrame(data)
df = df.set_index('date')
df.groupby(pd.Grouper(freq='1M')).max()
最多每月给您。频率可以更改为您喜欢的任何频率。
答案 3 :(得分:0)
我尝试了@Patrick Artner的评论:
a = (20180101,10)
b = (20180102,20)
c = (20180103,15)
d = (a,b,c)
maximum = max( d, key = lambda x:x[1])
minimum = min(d, key= lambda x:x[1])
print(minimum)
也许这给了一些启发。
答案 4 :(得分:-1)
如果这是理想的结果,请
data = [{'date':1, 'products_sold': 2}, {'date':2, 'products_sold': 5},{'date':5, 'products_sold': 2}]
start_date = 1
end_date = 2
max_value_in_period = max(x['products_sold'] for x in data if x['date'] >= start_date and x['date'] <= end_date)
print(max_value_in_period)