从pandas系列对象中获取价值

时间:2017-04-22 04:04:20

标签: python python-3.x pandas

我有一堆数据文件,包含列'姓名'性别''计数'每年一个文件。我需要连接所有文件一段时间,总结所有唯一名称的所有计数,并添加一个辅音量的新列。我无法从姓名'中提取字符串值。我该如何实现呢? 这是我的代码:

import os
import re
import pandas as pd

PATH = ...
def consonants_dynamics (years):
    names_by_year = {}
    for year in years:
        names_by_year[year] = pd.read_csv(PATH+"\\yob{}.txt".format(year), names =['Names', 'Gender', 'Count'])
    names_all = pd.concat(names_by_year, names=['Year', 'Pos'])
    dynamics = names_all.groupby('Names').sum().sort_values(by='Count', ascending=False).unstack('Names')
    dynamics['Consonants'] = dynamics.apply(count_vowels(dynamics.Names), axis = 1)
    return dynamics.head(10)

def count_vowels (name):
    vowels = re.compile('A|E|I|O|U|a|e|i|o|u')
    return len(name) - len (vowels.findall(name))

如果我运行类似

的话
a = consonants_dynamics(i for i in range (1900, 2001, 10))

我收到以下错误消息

<ipython-input-9-942fc155267e> in consonants_dynamcis(years)
...
---> 12     dynamics['Consonants'] = dynamics.apply(count_vowels(dynamics.Names), axis = 1)

AttributeError: 'Series' object has no attribute 'Names'

我尝试了各种方法,但都失败了。怎么办呢?

2 个答案:

答案 0 :(得分:1)

在执行取消堆栈后,您将动态转换为不再具有名称列dynamics.Names的系列对象。我认为应该通过删除.unstack('Names')

来解决

之后使用dynamics.index:

dynamics['Consonants'] = dynamics.reset_index()['Names'].apply(count_vowels)

答案 1 :(得分:1)

转换index to_series并应用功能:

print (dynamics)
        Count
Names        
James       2
John        3
Robert     10

def count_vowels (name):
    vowels = re.compile('A|E|I|O|U|a|e|i|o|u')
    return len(name) - len (vowels.findall(name))


dynamics['Consonants'] = dynamics.index.to_series().apply(count_vowels)

使用str.len无功能的解决方案,str.count

pat = 'A|E|I|O|U|a|e|i|o|u'
s = dynamics.index.to_series()
dynamics['Consonants_new'] = s.str.len() - s.str.count(pat)
print (dynamics)
        Count  Consonants_new  Consonants
Names                                    
James       2               3           3
John        3               3           3
Robert     10               4           4

编辑:

没有to_series的解决方案会将as_index=False添加到groupby以便返回DataFrame

names_all = pd.DataFrame({
'Names':['James','James','John','John', 'Robert', 'Robert'],
'Count':[10,20,10,30, 80,20]
})

dynamics = names_all.groupby('Names', as_index=False).sum()
                    .sort_values(by='Count', ascending=False)

pat = 'A|E|I|O|U|a|e|i|o|u'
s = dynamics.index.to_series()
dynamics['Consonants'] = dynamics['Names'].str.len() - dynamics['Names'].str.count(pat)

print (dynamics)
    Names  Count  Consonants
2  Robert    100           4
1    John     40           3
0   James     30           3