我有一个Pandas DataFrame,其中包含一个整数列。我正在尝试使用一个for循环来仅提取“艺术家”列与“甲壳虫乐队”匹配的那个DataFrame的Count值。
tot=[]
for art in df:
for df['Artist'] in art:
if art['Artist'] == 'The Beatles':
tot.append(artist['Count'])
The Dataframe format is as such:
'''
Rank Album Artist Count
1 1 The Beatles [White Album] The Beatles 1634
2 2 Rubber Soul The Beatles 1497
3 3 Revolver The Beatles 1489
4 4 Abbey Road The Beatles 1468
5 5 Meet Me in St. Louis Judy Garland with Georgie Stoll and His Orchestra 1399
'''
I receive the "TypeError: string indices must be integers".
答案 0 :(得分:1)
如果您要对DataFrame的The Beatles
列中的条目Artist
进行计数,则不必执行循环。
使用pandas.DataFrame.groupby
代替.transform('count')
。它将为您提供Artist
列中每个条目的计数。
df['Count'] = df.groupby('Artist')['Artist'].transform('count')
哪个给:
>>> data = ['The Beatles', 'Some Artist', 'Some Artist', 'The Beatles','The Beatles','The Beatles']
>>> df = pd.DataFrame(data,columns = ['Artist'])
>>> df
Artist
0 The Beatles
1 Some Artist
2 Some Artist
3 The Beatles
4 The Beatles
5 The Beatles
>>> df['Count'] = df.groupby('Artist')['Artist'].transform('count')
>>> df
Artist Count
0 The Beatles 4
1 Some Artist 2
2 Some Artist 2
3 The Beatles 4
4 The Beatles 4
5 The Beatles 4
如果要绘制结果图形,这将很有帮助。只需创建一个keys
等于Artist
列值并且values
等于Count
列值的字典即可。
由于python字典不允许在keys
上重复值,因此表示形式不会有问题。这样做:
>>> artist_count_dict = dict(zip(df['Artist'],df['Count']))
>>> artist_count_dict
{'The Beatles': 4, 'Some Artist': 2}
您现在可以出于绘图目的访问这些值。
您还可以使用df['Column Name'].value_counts()
为您提供所需的统计信息。
>>> df['Artist'].value_counts()
The Beatles 4
Some Artist 2
Name: Artist, dtype: int64
如果需要将其存储到一个新数据框中,请创建一个新数据框:
>>> df2 = df['Artist'].value_counts()
>>> df2 = pd.DataFrame(df2)
>>> df2.index.name = 'Artist'
>>> df2.columns = ['Count']
>>> df2
Count
Artist
The Beatles 4
Some Artist 2
答案 1 :(得分:1)
如果要保存所有Counts
(其中Artist是The Beatles)的数组,请使用:
df.loc[df['Artist'] == 'The Beatles', 'Count'].tolist()
如果您需要对所有计数求和,请使用.sum()
df.loc[df['Artist'] == 'The Beatles', 'Count'].sum()
答案 2 :(得分:0)
我强迫你在寻找这个
tot = df.loc[df['Artist']=='The Beatles','Count'].tolist()