我是python和pandas的新手,所以我很快就遇到了问题。我现在正在使用Spyder。
我正在尝试在列中找到一个短语(不是完整的字符串)并拉出包含该短语的所有行。到目前为止,这是我的代码:
import pandas as pd
df2 = pd.read_csv("C:\...\Desktop\publiccomments.csv")
print[df2["Document_Title"].str.contains("King")]
当我这样做时,我得到一个布尔列表:
0 True
1 False
2 False
3 False
4 False
&安培; TC
当我试图掩盖它时,我遇到了许多错误。
print(df2["Document_Title"].str.contains("King"))
返回
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...Continuum\Anaconda2\lib\site- packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/.../untitled1.py", line 15, in <module>
print(df2[df2["Document_Title"].str.contains("King")])
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2080, in _getitem_array
if com.is_bool_indexer(key):
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\common.py", line 201, in is_bool_indexer
raise ValueError('cannot index with vector containing '
ValueError: cannot index with vector containing NA / NaN values
我尝试添加
df2 = df1.dropna(subset=df1.columns[[1]], how='any')
要解决“无法使用包含NA / NaN值的向量索引”错误,但没有骰子。
任何帮助将不胜感激!以下是我的数据示例:
Document_Title Document Type \
0 Comment submitted by J. King PUBLIC SUBMISSIONS
1 Comment submitted by N. Ghani PUBLIC SUBMISSIONS
2 Comment submitted by M. Srobode PUBLIC SUBMISSIONS
3 Comment submitted by D. Hovey PUBLIC SUBMISSIONS
4 Comment submitted by B. Sweigert PUBLIC SUBMISSIONS
5 Comment submitted by M. Lundgen PUBLIC SUBMISSIONS
6 Comment submitted by Craig (no surname provided) PUBLIC SUBMISSIONS
7 Comment submitted by R. Marshall PUBLIC SUBMISSIONS
8 Comment submitted by A. Greig PUBLIC SUBMISSIONS
9 Comment submitted by J. B. Anderson PUBLIC SUBMISSIONS
Posted Date Received Date Comment Start Date Comment Due Date \
0 10/16/2014 9/8/2014 6/18/2014 12/1/2014
1 8/6/2014 6/7/2014 6/18/2014 10/16/2014
2 10/16/2014 9/15/2014 6/18/2014 12/1/2014
3 8/6/2014 6/7/2014 6/18/2014 10/16/2014
4 12/18/2014 11/8/2014 6/18/2014 12/1/2014
5 10/16/2014 9/15/2014 6/18/2014 12/1/2014
6 8/6/2014 6/7/2014 6/18/2014 10/16/2014
7 8/15/2014 6/7/2014 6/18/2014 10/16/2014
8 12/18/2014 11/8/2014 6/18/2014 12/1/2014
9 10/16/2014 9/15/2014 6/18/2014 12/1/2014
Document Detail
0 [hyperlink]
1 [hyperlink]
2 [hyperlink]
3 [hyperlink]
4 [hyperlink]
6 [hyperlink]
7 [hyperlink]
8 [hyperlink]
9 [hyperlink]
答案 0 :(得分:0)
你正在寻找类似的东西
df2 = df[df["Column"].str.contains("King")]
print(df2)
基本上,您在代码中执行的操作是检索条件上的布尔系列。如果您使用那个过滤数据文件(即将其作为选择数据框的行传递,如代码中所示),它就能满足您的需求。
答案 1 :(得分:-1)
我认为你的df2以某种方式搞砸了。以下是df:
的示例title tractsOfLand
King 100
Duke 50
Dutchess 4
Baron 5
Princess 5000
Rey 90
Roi 23
制作一个布尔掩码并用它索引df:
m = df2["title"].str.contains("King")
df2[m]
给出:
title tractsOfLand
King 100
看一下布尔索引in the cookbook。