我有一个如下所示的数据框:
我要做的是检查days_diff是否使用numpy和pandas进行NaT,如果是NaT,则通过减去" 2016-01-01"来更新它。通过不合时宜的时间。运行以下代码后:
df[['days_diff']] = np.where(pd.isnull(df[['days_diff']]), df[['outofservicedatetime']] - np.datetime64('2016-01-01'), df[['days_diff']])
我得到的输出如下:
我如何将days_diff值设为天?或者,如果任何人都可以建议更容易实现这一点,那将同样有用。
答案 0 :(得分:0)
通过使用library(dplyr); library(tidyr); library(dummies)
df2 <- df %>% separate_rows(amenities, sep = ",")
df2$amenities <- trimws(df2$amenities, "both") # remove spaces (left and right) - so that you will not have 2 "pool" columns in your final data frame
df2 <- dummy.data.frame(df2)[, -2]
colnames(df2) <- trimws(gsub("amenities", "", colnames(df2)), "both") # arrange colnames
df3 <- df2 %>%
group_by(id) %>%
summarise_all(funs(sum)) ## aggregate by column and id
df3
# A tibble: 5 x 7
#id `air conditioning` dryer kitchen pool washer `wireless internet`
#<dbl> <int> <int> <int> <int> <int> <int>
# 1 1 0 1 1 0 1
# 2 0 1 1 1 1 0
# 3 0 1 1 0 0 1
# 4 0 0 0 0 0 0
# 5 0 0 0 0 0 1
,您可以在[df.loc[df['days_diff'].isnull()...
上获得比速度提高两倍的速度,可选地使用参数&#39; inplace = True&#39;复制pd.Series.fillna
的行为。
df.loc