Question

实际上，我需要绘制所有仅在2012年10月发生的变化，因此，我要统计30行，以便可以在xlim中使用它们进行绘制。

import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
    if date[0:7] == '2012-10':
        xlimit.append(row_in)
        row_in += 1
    else:
        row_in+=1
print(min(xlimit))
print(max(xlimit))

但是我不明白为什么xlimit尽管在上面执行了操作，却还是空了。

Answer 1

下载该URL后，我可以用np.genfromtxt加载它：

In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
     ...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
     ...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
    Line #77 (got 13 columns instead of 17)
    Line #238 (got 13 columns instead of 17)
    Line #460 (got 18 columns instead of 17)
    Line #488 (got 18 columns instead of 17)
    Line #493 (got 13 columns instead of 17)
    Line #507 (got 18 columns instead of 17)
    Line #515 (got 18 columns instead of 17)
    Line #538 (got 18 columns instead of 17)
    Line #550 (got 18 columns instead of 17)
  #!/usr/bin/python3

在处理较短/较长的线时，它不像pandas那样宽容。

In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])

start_date字段如下：

在[235]中：data ['Start_Date'] [：10] 出[235]： array（['2012-11-04'，'2012-11-03'，'2012-11-03'，'2012-11-03'， '2012-11-03'，'2012-11-03'，'2012-11-03'，'2012-11-01'， '2012-11-02'，'2012-11-02']，dtype ='

我可以用where搜索它。我正在使用astype将字段限制为7个字符。

In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]: 
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
       36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
       53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
       70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
       87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

我可以使用usecols来绕过可变的行长-假设“坏”行在后面的字段中有所不同。

In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
     ...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
     ...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]: 
array([ 18,  19,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  31,
        32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
        45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,
        58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,
        71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,
        84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,
        97,  98,  99, 100])

通过像您一样的迭代搜索，我可以获得相同的列表：

In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
     ...:     if date[:7] == '2012-10':
     ...:         alist.append(i)
     ...:         
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]: 
array([ 18,  19,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  31,
        32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
        45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,
        58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,
        71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,
        84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,
        97,  98,  99, 100])

尽管对其执行了操作，但列表显示为空

1 个答案: