Question

我正在尝试从pandas Dataframe中的列中过滤值，但我似乎在接收布尔值而不是实际值。我试图按月和年过滤我们的数据。在下面的代码中，您将看到我只按年度过滤，但我已经多次尝试过不同的月份和年份：

    In [1]: import requests

    In [2]: import pandas as pd # pandas

    In [3]: import datetime as dt # module for manipulating dates and times

    In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"

    In [5]: source = requests.get(url).text

    In [6]: from io import StringIO, BytesIO

    In [7]: s = StringIO(source)

    In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)


    In [9]: election_data.head(n=3)
    Out[9]:
                Pollster Start Date   End Date Entry Date/Time (ET)  \
    0  Politico/GWU/Battleground 2012-11-04 2012-11-05  2012-11-06 08:40:26
    1           YouGov/Economist 2012-11-03 2012-11-05  2012-11-26 15:31:23
    2           Gravis Marketing 2012-11-03 2012-11-05  2012-11-06 09:22:02

       Number of Observations     Population             Mode  Obama  Romney  \
    0                  1000.0  Likely Voters       Live Phone   47.0    47.0
    1                   740.0  Likely Voters         Internet   49.0    47.0
    2                   872.0  Likely Voters  Automated Phone   48.0    48.0

       Undecided  Other                                       Pollster URL  \
    0        6.0    NaN  http://elections.huffingtonpost.com/pollster/p...
    1        3.0    NaN  http://elections.huffingtonpost.com/pollster/p...
    2        4.0    NaN  http://elections.huffingtonpost.com/pollster/p...

                                              Source URL     Partisan Affiliation  \
    0  http://www.politico.com/news/stories/1112/8338...  Nonpartisan        None
    1  http://cdn.yougov.com/cumulus_uploads/document...  Nonpartisan        None
    2  http://www.gravispolls.com/2012/11/gravis-mark...  Nonpartisan        None

       Question Text  Question Iteration
    0            NaN                   1
    1            NaN                   1
    2            NaN                   1

    In [10]: start_date = pd.Series(election_data["Start Date"])
        ...: start_date.head(n=3)
        ...:
    Out[10]:
    0   2012-11-04
    1   2012-11-03
    2   2012-11-03
    Name: Start Date, dtype: datetime64[ns]

    In [11]: filtered = start_date.map(lambda x: x.year == 2012)

    In [12]: filtered
    Out[12]:
    0       True
    1       True
    2       True
    ...
    587    False
    588    False
    589    False
    Name: Start Date, dtype: bool

Answer 1

我认为首先需要read_csv url地址，然后boolean indexing需要year和month创建的掩码：

election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])

print (election_data.head(3))
                    Pollster Start Date   End Date Entry Date/Time (ET)  \
0  Politico/GWU/Battleground 2012-11-04 2012-11-05  2012-11-06 08:40:26   
1           YouGov/Economist 2012-11-03 2012-11-05  2012-11-26 15:31:23   
2           Gravis Marketing 2012-11-03 2012-11-05  2012-11-06 09:22:02   

   Number of Observations     Population             Mode  Obama  Romney  \
0                  1000.0  Likely Voters       Live Phone   47.0    47.0   
1                   740.0  Likely Voters         Internet   49.0    47.0   
2                   872.0  Likely Voters  Automated Phone   48.0    48.0   

   Undecided  Other                                       Pollster URL  \
0        6.0    NaN  http://elections.huffingtonpost.com/pollster/p...   
1        3.0    NaN  http://elections.huffingtonpost.com/pollster/p...   
2        4.0    NaN  http://elections.huffingtonpost.com/pollster/p...   

                                          Source URL     Partisan Affiliation  \
0  http://www.politico.com/news/stories/1112/8338...  Nonpartisan        None   
1  http://cdn.yougov.com/cumulus_uploads/document...  Nonpartisan        None   
2  http://www.gravispolls.com/2012/11/gravis-mark...  Nonpartisan        None   

   Question Text  Question Iteration  
0            NaN                   1  
1            NaN                   1  
2            NaN                   1

print (election_data.dtypes)
Pollster                          object
Start Date                datetime64[ns]
End Date                  datetime64[ns]
Entry Date/Time (ET)      datetime64[ns]
Number of Observations           float64
Population                        object
Mode                              object
Obama                            float64
Romney                           float64
Undecided                        float64
Other                            float64
Pollster URL                      object
Source URL                        object
Partisan                          object
Affiliation                       object
Question Text                    float64
Question Iteration                 int64
dtype: object


election_data[election_data["Start Date"].dt.year == 2012]

election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]

Answer 2

如果您将Start Date设为索引

，则可以使用pandas日期过滤

获取所有2012
election_data.set_index('Start Date')['2012']

获取所有Jan, 2012
election_data.set_index('Start Date')['2012-01']

全程Jan 1, 2012和Jan 13, 2012 election_data.set_index('Start Date')['2012-01-01':'2012-01-13]

过滤Pandas Dataframe或Series中的值

2 个答案: