如何使函数使用python子集化数据框

时间:2019-08-30 23:25:31

标签: python pandas dataframe

我是熊猫的新手。我有一个df,其中包含每个州的人口统计信息。我正在尝试创建一个使用df和行列表(在本例中为状态列表)的函数,并为每个状态返回df。我认为我的逻辑不对,因为我遇到了错误。预先感谢。

这是我的df的示例:

     State   Year   Deaths          
0   Alabama 1999    39  
1   Alabama 2000    46  
2   Alabama 2001    67  
3   Alabama 2002    75  

共享我在下面尝试过的功能:

def subseting(df ,list_of_states):
    df_copy = df.copy()
    for i in list_of_states:
         if i == df_copy.State :
                df_copy = df[df.State == i]
                df_copy = df_copy[[ 'Year' , 'Deaths']]
    return df_copy

a = ['Alabama' , 'Alaska' , 'Arizona ']

print(subseting(df, a))

在下面共享我的错误:

ValueError  Traceback (most recent call last)
<ipython-input-304-3528e6a59ccf> in <module>
      1 a = ['Alabama' , 'Alaska' , 'Arizona ']
      2 
----> 3 print(subseting_44(df, a))

<ipython-input-303-faa8c8e91e86> in subseting_44(df, list_of)
      2     df_copy = df.copy()
      3     for i in list_of:
----> 4         if i == df.State:
      5 #     df_copy= df[df.State == list_of]
      6 

/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

1 个答案:

答案 0 :(得分:2)

使用熊猫import React from 'react'; import _ from 'lodash'; import { View, ScrollView, StyleSheet, Text, SafeAreaView } from 'react-native'; export default class App extends React.PureComponent { render() { return ( <SafeAreaView style={styles.container}> <ScrollView style={{ height: '100%', width: '100%' }} horizontal nestedScrollEnabled > <View style={{ flexDirection: 'row' }}> <ScrollView style={{ width: 200, height: '100%' }} horizontal nestedScrollEnabled > <View style={{ flexDirection: 'row' }}> {_.times(200, n => ( <View key={1000 + n} style={{ marginRight: 10 }}> <Text>{1000 + n}</Text> </View> ))} </View> </ScrollView> {_.times(200, n => ( <View key={n} style={{ marginRight: 10 }}> <Text>{n}</Text> </View> ))} </View> </ScrollView> </SafeAreaView> ); } } const styles = StyleSheet.create({ container: { flex: 1, flexDirection: 'column', justifyContent: 'center', alignItems: 'stretch', paddingVertical: 50, }, }); 进行过滤,并使用query创建单独的子集

groupby

打印

from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(
"""i   State   Year   Deaths          
0   Alabama 1999    39  
1   Alabama 2000    46  
2   Alabama 2001    67  
3   Alabama 2002    75  
4   Alaska  2001     1
5   Alaska  2002     2   
6   Maine   2002     3   
7   Maine   2002     5   
"""
))

# single filtered dataframe
def subsetting1(df, list_of_states):
   return df.query('State in @list_of_states')

print (subsetting1(df, ["Alaska", "Alabama"]))

# list of dataframes
def subsetting2(df, list_of_states):
    grouped = df.query('State in @list_of_states').groupby("State")
    return [grouped.get_group(d) for d in list_of_states]

subsets = subsetting2(df, ["Alaska", "Alabama"])
for s in subsets:
    print(s)

       i    State  Year  Deaths
    0  0  Alabama  1999      39
    1  1  Alabama  2000      46
    2  2  Alabama  2001      67
    3  3  Alabama  2002      75
    4  4   Alaska  2001       1
    5  5   Alaska  2002       2