我正在尝试将日期格式更改为“月年”格式,而不更改非日期值。
import matplotlib.pyplot as plt
import nltk # Natural Language ToolKit
nltk.download('stopwords')
from nltk.corpus import stopwords # to get rid of StopWords
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator # to create a Word Cloud
from PIL import Image # Pillow with WordCloud to image manipulation
text = 'New stop words are bad for this text.'
# Adding the stopwords
stop_words = stopwords.words('en')
new_stopwors = ['new', 'stop', 'words']
stop_words.extend(new_stopwords)
stop_words = set(stop_words)
# Getting rid of the stopwords
clean_text = [word for word in text.split() if word not in stop_words]
# Converting the list to string
text = ' '.join([str(elem) for elem in clean_text])
# Generating a wordcloud
wordcloud = WordCloud(background_color = "black").generate(text)
# Display the generated image:
plt.figure(figsize = (15, 10))
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis("off")
plt.show()
input_df是
预期输出为:
我对以下无效的代码感到厌倦:
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
请帮助。
答案 0 :(得分:4)
您可以使用error='coerce'
和fillna
:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
输出:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
更新:第二个更安全的选择:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))