我正在尝试从pandas Dataframe中的列中过滤值,但我似乎在接收布尔值而不是实际值。 我试图按月和年过滤我们的数据。 在下面的代码中,您将看到我只按年度过滤,但我已经多次尝试过不同的月份和年份:
In [1]: import requests
In [2]: import pandas as pd # pandas
In [3]: import datetime as dt # module for manipulating dates and times
In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"
In [5]: source = requests.get(url).text
In [6]: from io import StringIO, BytesIO
In [7]: s = StringIO(source)
In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)
In [9]: election_data.head(n=3)
Out[9]:
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
In [10]: start_date = pd.Series(election_data["Start Date"])
...: start_date.head(n=3)
...:
Out[10]:
0 2012-11-04
1 2012-11-03
2 2012-11-03
Name: Start Date, dtype: datetime64[ns]
In [11]: filtered = start_date.map(lambda x: x.year == 2012)
In [12]: filtered
Out[12]:
0 True
1 True
2 True
...
587 False
588 False
589 False
Name: Start Date, dtype: bool
答案 0 :(得分:3)
我认为首先需要read_csv
url
地址,然后boolean indexing
需要year
和month
创建的掩码:
election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])
print (election_data.head(3))
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
print (election_data.dtypes)
Pollster object
Start Date datetime64[ns]
End Date datetime64[ns]
Entry Date/Time (ET) datetime64[ns]
Number of Observations float64
Population object
Mode object
Obama float64
Romney float64
Undecided float64
Other float64
Pollster URL object
Source URL object
Partisan object
Affiliation object
Question Text float64
Question Iteration int64
dtype: object
election_data[election_data["Start Date"].dt.year == 2012]
election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]
答案 1 :(得分:3)