我正在使用python,我想将数据分组到各列,同时将缺少的日期从与事件发生对应的date1添加到与选择和填充日期对应的另一个date2中在我由forwarfill确定的列中缺少值。
我在r上尝试了下面的代码,但我想在python中做同样的事情
library(data.table)
library(padr)
library(dplyr)
data = fread("path", header = T)
data$ORDERDATE <- as.Date(data$ORDERDATE)
datemax = max(data$ORDERDATE)
data2 = data %>%
group_by(Column1, Column2) %>%
pad(.,group = c('Column1', 'Column2'), end_val = as.Date(datemax), interval = "day",break_above = 100000000000) %>%
tidyr::fill("Column3")
我在python中搜索了相应的包库(padr),但找不到任何包。
答案 0 :(得分:-1)
感谢您回答我的要求。 作为示例,我有此表:
users=['User1','User1','User2','User1','User2','User1','User2','User1','User2'],
products=['product1','product1','product1','product1','product1','product2','product2','product2','product2'],
quantities=[5,6,8,10,4,5,2,9,7],
prices=[2,2,5,5,6,6,6,7,7],
data = pd.DataFrame({'date':dates,'user':users,'product':products,'quantity':quantities,'price':prices}),
data['date'] = pd.to_datetime(data.date, format='%Y-%m-%d'),
data2=data.groupby(['user','product','date'],as_index=False).mean()```[enter image description here][1]
for User1 and product1 for exemple i want to input missing dates and fill the quantities column with the value 0 and the column price with backward values from a range of date that a choose.
And do the same by users and by product for remainings in my data.
the result should look like this:
[1]: https://i.stack.imgur.com/qOOda.png
the r code i used to generate the image is as follow:
```library(padr)
library(dplyr)
dates=c('2014-01-14','2014-01-14','2014-01-15','2014-01-19','2014-01-18','2014-01-25','2014-01-28','2014-02-05','2014-02-14')
users=c('User1','User1','User2','User1','User2','User1','User2','User1','User2')
products=c('product1','product1','product1','product1','product1','product2','product2','product2','product2')
quantities=c(5,6,8,10,4,5,2,9,7)
prices=c(2,2,5,5,6,6,6,7,7)
data=data.frame(date=c('2014-01-14','2014-01-14','2014-01-15','2014-01-19','2014-01-18','2014-01-25','2014-01-28','2014-02-05','2014-02-14'),user=c('User1','User1','User2','User1','User2','User1','User2','User1','User2'),product=c('product1','product1','product1','product1','product1','product2','product2','product2','product2'),quantity=c(5,6,8,10,4,5,2,9,7),price=c(2,2,5,5,6,6,6,7,7))
data$date <- as.Date(data$date)
datemax = max(data$date)
data2 = data %>% group_by(user, product) %>% pad(.,group = c('user', 'product'), end_val = as.Date(datemax), interval = "day",break_above = 100000000000)
data3=data2 %>% group_by(user,product,date) %>%
summarize(quantity=sum(quantity),price=mean(price))
data4=data3%>% tidyr::fill("price")%>% fill_by_value(quantity, value = 0)```