Question

GitHub Link to the notebook

我目前正在研究一个项目，该项目正在分析德克萨斯州奥斯丁的仇恨犯罪趋势。目前，我的数据有问题。在“ incident_number”列中，我想将其分成两部分...“-”之前的数字清楚地表示年份，我想将其合并到“ month”列中。我要保留在'incident_number'列中的'-'之后的数字。

任何人都知道我如何实现这一目标？

我最初尝试过：

aus_final['incident_number'] = pd.to_datetime(aus_final['incident_number'], format='%d%m%Y')

产生错误：

ValueError: time data '2017-241137' does not match format '%d%m%Y' (match)

我有点知道这种情况会发生，但是我还是必须尝试。：P不用说，我仍然是Python的新手。任何帮助将不胜感激。

Answer 1

Link to the referenced notebook

经过几次尝试，但我终于做到了。老实说，这是一个反复试验的问题。我阅读了一些关于stackoverflow的问题论坛，这些问题与大熊猫以及ex的结构，列格式等有关。 splitting columns，handling categorical data和another aid on categorical data仅举几例。我最终使用以下代码赢得了大奖：

new = aus_final["incident_number"].str.split("-", n = 1, expand = True)
aus_final["year"]= new[0]
aus_final["occurence_number"]= new[1]
aus_final.drop(columns =["incident_number"], inplace = True)
aus_final['date'] = aus_final[['month', 'year']].agg('-'.join, axis=1)
aus_final.drop(['month', 'occurence_number', 'year'], axis=1, inplace=True)
aus_final = aus_final[['date', 'bias', 'number_of_victims_over_18', 'offense_location']]
aus_final.rename(columns={'number_of_victims_over_18': 'victims'}, inplace=True)
aus_final['date'] = pd.to_datetime(aus_final['date'])
aus_final.set_index('date', inplace=True)

我可能是一个学习缓慢的人，但是一旦为自己尝试了几次，我肯定会保留所有内容。 :)感谢您引导我朝正确的方向前进！

列中的字符串和对象（熊猫）

1 个答案: