如何使用for循环从Pandas DataFrame列进行追加?

时间:2019-09-05 16:13:17

标签: python pandas dataframe

我有一个Pandas DataFrame,其中包含一个整数列。我正在尝试使用一个for循环来仅提取“艺术家”列与“甲壳虫乐队”匹配的那个DataFrame的Count值。

tot=[]

for art in df:

    for df['Artist'] in art:

        if art['Artist'] == 'The Beatles':

            tot.append(artist['Count'])

The Dataframe format is as such:


'''

Rank    Album   Artist  Count
1   1   The Beatles [White Album]   The Beatles 1634
2   2   Rubber Soul The Beatles 1497
3   3   Revolver    The Beatles 1489
4   4   Abbey Road  The Beatles 1468
5   5   Meet Me in St. Louis    Judy Garland with Georgie Stoll and His Orchestra   1399

'''

I receive the "TypeError: string indices must be integers".

3 个答案:

答案 0 :(得分:1)

方法1:

如果您要对DataFrame的The Beatles列中的条目Artist进行计数,则不必执行循环。

使用pandas.DataFrame.groupby代替.transform('count')。它将为您提供Artist列中每个条目的计数。

df['Count'] = df.groupby('Artist')['Artist'].transform('count')

哪个给:

>>> data = ['The Beatles', 'Some Artist', 'Some Artist', 'The Beatles','The Beatles','The Beatles']
>>> df = pd.DataFrame(data,columns = ['Artist'])
>>> df
        Artist
0  The Beatles
1  Some Artist
2  Some Artist
3  The Beatles
4  The Beatles
5  The Beatles
>>> df['Count'] = df.groupby('Artist')['Artist'].transform('count')
>>> df
        Artist  Count
0  The Beatles      4
1  Some Artist      2
2  Some Artist      2
3  The Beatles      4
4  The Beatles      4
5  The Beatles      4

如果要绘制结果图形,这将很有帮助。只需创建一个keys等于Artist列值并且values等于Count列值的字典即可。

由于python字典不允许在keys上重复值,因此表示形式不会有问题。这样做:

>>> artist_count_dict = dict(zip(df['Artist'],df['Count']))
>>> artist_count_dict
{'The Beatles': 4, 'Some Artist': 2}

您现在可以出于绘图目的访问这些值。

方法2:

您还可以使用df['Column Name'].value_counts()为您提供所需的统计信息。

>>> df['Artist'].value_counts()
The Beatles    4
Some Artist    2
Name: Artist, dtype: int64

如果需要将其存储到一个新数据框中,请创建一个新数据框:

>>> df2 = df['Artist'].value_counts()
>>> df2 = pd.DataFrame(df2)
>>> df2.index.name = 'Artist'
>>> df2.columns = ['Count']
>>> df2
             Count
Artist
The Beatles      4
Some Artist      2

答案 1 :(得分:1)

如果要保存所有Counts(其中Artist是The Beatles)的数组,请使用:

df.loc[df['Artist'] == 'The Beatles', 'Count'].tolist()

如果您需要对所有计数求和,请使用.sum()

df.loc[df['Artist'] == 'The Beatles', 'Count'].sum()

答案 2 :(得分:0)

我强迫你在寻找这个

tot = df.loc[df['Artist']=='The Beatles','Count'].tolist()