Question

我试图分析美国的民意调查数据，具体来说，我试图弄清楚哪些国家是安全的，边缘的或紧张的（紧密的＆＃39;）。我有一个数据框，其中包含按时间划分的调查结果以及他们的亲密度。我正在使用这个Pandas声明来总结“亲密关系”。条目。

s=self.daily.groupby('State')['closeness'].unique()

这给了我这个系列（为简洁而显示的选择）：

State
AK                     [safe]
AL                     [safe]
CA                     [safe]
CO    [safe, tight, marginal]
FL          [marginal, tight]
IA    [safe, tight, marginal]
ID                     [safe]
IL                     [safe]
IN              [tight, safe]
Name: closeness, dtype: object

行是数组类型，因此，例如，s[0]给出：

array(['safe'], dtype=object)

我试图从这个系列中进行选择，但我无法正确使用语法。例如，我试图仅选择“安全”。使用此语法的国家：

ipdb> s[s == 'safe']
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

这也不起作用：

test[test == ['safe'])

这就是我想要做的事情：选择边缘地区的国家＆＃39;或者＆＃39;紧张＆＃39;，选择安全的国家＆＃39;并且只有“安全”等等。有没有人知道我应该使用的语法，或者首先采用更好的方法？

============ 这是groupby之前的数据样本：

ipdb> self.daily.head(3)
        Date  Democratic share    Margin    Method  Other share  \

0 2008-11-04          0.378894 -0.215351  Election     0.026861   
1 2008-11-04          0.387404 -0.215765  Election     0.009427   
2 2008-11-04          0.388647 -0.198512  Election     0.024194   

   Republican share State closeness      winner  
0          0.594245    AK      safe  Republican  
1          0.603169    AL      safe  Republican

Answer 1

假设您有一系列列表的DataFrame，请说：

df = pd.DataFrame({'a': [['safe'], ['safe', 'tight'], []]})

然后，要查看哪些是完全安全的，您可以使用：

In [7]: df.a.apply(lambda x: x == ['safe'])
Out[7]: 
0     True
1    False
2    False
Name: a, dtype: bool

要找到包含安全的内容，您可以使用：

 In [9]: df.a.apply(lambda x: 'safe' in x)
 Out[9]: 
 0     True
 1     True
 2    False
 Name: a, dtype: bool

等等。

Answer 2

OP给出的数据帧样本：

In[66]:df
Out[66]: 
         Date  Democratic share    Margin    Method  Other share  0  2008-11-04          0.378894 -0.215351  Election     0.026861   
1  2008-11-04          0.387404 -0.215765  Election     0.009427   
2  2008-11-04          0.388647 -0.198512  Election     0.024194   
3  2008-11-04          0.384547 -0.194545  Election     0.024194   
4  2008-11-04          0.345330 -0.194512  Election     0.024459   

   Republican share State closeness      winner  
0          0.594245    AK      safe  Republican  
1          0.603169    AL      safe  Republican  
2          0.454545    CA     tight  Democratic  
3          0.453450    CO  marginal  Democratic  
4          0.454545    FL     tight    Republic

然后使用grupby：

In[67]:s=df.groupby('State')['closeness'].unique()

In[68]:s
Out[68]: 
State
AK        [safe]
AL        [safe]
CA       [tight]
CO    [marginal]
FL       [tight]

然后使用np.where：

In[69]:s.ix[np.where(s=='safe')]
Out[69]: 
State
AK    [safe]
AL    [safe]
Name: closeness, dtype: object

Answer 3

我认为使用s构建系列.unique()并不是解决此问题的最佳方法。请尝试使用pd.crosstab。

import pandas as pd

daily = pd.DataFrame({'State': ['AK', 'AL', 'CA', 'CO', 'CO', 'CO', 'FL',
                                'FL', 'IA', 'IA', 'IA', 'ID', 'IL', 'IN', 'IN'],
                      'closeness': ['safe', 'safe', 'safe', 'safe', 'tight',
                                    'marginal', 'marginal', 'tight', 'safe',
                                    'tight', 'marginal', 'safe', 'safe',
                                    'tight', 'safe']})
ct = pd.crosstab(daily['State'], daily['closeness'])
print(ct)

输出：

closeness  marginal  safe  tight
State                           
AK                0     1      0
AL                0     1      0
CA                0     1      0
CO                1     1      1
FL                1     0      1
IA                1     1      1
ID                0     1      0
IL                0     1      0
IN                0     1      1

一方面，此ct包含的信息与s完全相同;另一方面，按照你想要的方式选择状态变得微不足道。你提出的两个例子：

# states that are 'marginal' or 'tight'
print(ct.loc[(ct['marginal'] > 0) | (ct['tight'] > 0)]
        .index.values)
# => ['CO', 'FL', 'IA', 'IN']

# States that are 'safe' and only 'safe'
print(ct.loc[(ct['safe'] > 0) & (ct['marginal'] == 0) & (ct['tight'] == 0)]
        .index.values)
# => ['AK', 'AL', 'CA', 'ID', 'IL']

或者，使用可能更具可读性的.query()：

# states that are 'marginal' or 'tight'
print(ct.query('marginal > 0 | tight > 0').index.values)
# => ['CO', 'FL', 'IA', 'IN']

# States that are 'safe' and only 'safe'
print(ct.query('safe > 0 & marginal == 0 & tight == 0')
        .index.values)
# => ['AK', 'AL', 'CA', 'ID', 'IL']

但是，如果您坚持使用s，那么您可以通过以下方式构建ct：

ct = s.str.join(' ').str.get_dummies(sep=' ')

从Pandas Series中选择行，其中行是数组

3 个答案: