过滤大熊猫数据帧行不起作用

时间:2016-12-25 12:17:32

标签: python string pandas dataframe

我有一个以下的pandas数据帧:

In [23]: df
Out[23]: 
                                                 names
0                                        Alabama[edit]
1                        Auburn (Auburn University)[1]
2               Florence (University of North Alabama)
3      Jacksonville (Jacksonville State University)[2]
4           Livingston (University of West Alabama)[2]
5             Montevallo (University of Montevallo)[2]
6                            Troy (Troy University)[2]
7    Tuscaloosa (University of Alabama, Stillman Co...
8                    Tuskegee (Tuskegee University)[5]
9                                         Alaska[edit]
10       Fairbanks (University of Alaska Fairbanks)[2]
11                                       Arizona[edit]
12          Flagstaff (Northern Arizona University)[6]
13                    Tempe (Arizona State University)
14                      Tucson (University of Arizona)

如您所见,names中的某些条目中包含[edit]字样。我想只过滤这些条目并从中创建一个新的数据帧。所以我试过了:

In [24]: df1 = df[df['names'].str.contains("[edit]")]

但是,新数据框df1并未提供我想要的内容,仍然包含原始数据框的所有条目:

In [25]: df1.head()
Out[25]: 
                                             names
0                                    Alabama[edit]
1                    Auburn (Auburn University)[1]
2           Florence (University of North Alabama)
3  Jacksonville (Jacksonville State University)[2]
4       Livingston (University of West Alabama)[2]

究竟我缺少什么,我该如何解决?

1 个答案:

答案 0 :(得分:1)

您可以使用str.extract来解析,命名列,并立即删除所有内容

df.names.str.extract('(?P<names>.+)\[edit\]', expand=True).dropna()

enter image description here