我的pandas DataFrame有点问题。
如图所示,第一行的released_date
为“已发布的2006”,而同一列的所有其他值的格式为“已发布的MMM DD”。
我想将released_date
下的第一个单元格拆分为“已发布”和“2006”,将“2006”复制到年级列,然后将所有内容移动一列。有任何想法吗?
当前格式:
...|**released_date**| **year** | **genre** | ...
...| Released 2006 | Arcade | Comic |...
所需的输出格式:
...|**released_date**| **year** | **genre** | ...
...| Released | 2006 | Arcade |...
提前致谢!!
以下是用于读取文件的代码:
import pandas as pd
df = pd.read_csv("IndieGameCSV/page_1.csv", \
names=["Windows","Mac","Linux","engine","release_date","year","genre1",\
"theme","players","score_final","rating", "link" ], index_col=False)
以下是图片中显示的数据:
True, False, True,Custom Built,Released 2006,Arcade,Comic,Single Player, 10,1 v, http://indiedb.com/games/tux-climber,
True, True, True,Custom Built,Released Oct 20, 2014,Role Playing,Fantasy,MMO, 7.3,45 , http://indiedb.com/games/pokemon-planet,
True, True, True,Ren'py,Released May 16, 2015,Turn Based Strategy,Noire,Single Player, 9,1 v, http://indiedb.com/games/black-closet,
True, True, False,ShiVa3D,Released Jan 2, 2015,First Person Shooter,Sci-Fi,Single Player, 7.8,4 v, http://indiedb.com/games/kumoon,
答案 0 :(得分:0)
您可以使用str.extract
方法提取年份:
In [11]: df["release_date"].str.extract("(\d{4})")
Out[11]:
0 2006
1 2014
2 2015
3 2015
Name: "release_date", dtype: object
如果您想要拆分DataFrame,还可以查看.str.match
,以检查列是否与正则表达式匹配:
In [12]: df["release_date"].str.match("Released \d{4}")
Out[12]:
0 True
1 False
2 False
3 False
Name: "release_date", dtype: bool
用这个和〜这个索引df。