Question

我有一个数据框＆＃39; genres＆＃39;，其中列中每行的值由＆＃39;，＆＃39;分隔。我需要计算每个值，如喜剧2，戏剧7等等。尝试了许多坚果失败的方法。

我尝试genres = trending.groupby(['genre']).size()，但此行将值'Comedy,Crime,CriticallyAcclaimed'视为一个。我是python的新手，请帮助我。

genre
Comedy,Crime,CriticallyAcclaimed
Comedy,Drama,Romance
Drama
Drama
Drama,Hollywood
Drama,Romance
Drama,Romance
Drama,Romance,Classic

Answer 1

我得到了答案：

genres = pd.DataFrame(genres.genre.str.split(',', expand=True).stack(), columns= ['genre'])  
genres = genres.reset_index(drop = True)  
genre_count = pd.DataFrame(genres.groupby(by = ['genre']).size(),columns = ['count'])  
genre_count = genre_count.reset_index()

Answer 2

如果您正在使用pandas，可以猜到OP中甚至没有说过的内容，您可以做类似的事情：

from collections import Counter

// Code where you get trending variable

genreCount = Counter()
for row in trending.itertuples():
    genreCount.update(row[0].split(",")) // Change the 0 for the position where the genre column is

print(genreCount) // It works as a dict where keys are the genres and values the appearances
print(dict(genreCount)) // You can also turn it inot a dict but the Counter variable already works as one

Answer 3

以下代码假设您已知道一行中的最大项目数。这意味着您需要读取一次文件并找到此信息（此处我们假设此数字基于您的示例为3）。

max_num_of_items_in_one_row = 3
cols = range(max_num_of_items_in_one_row)
df = pd.read_csv('genre.txt', names=cols, engine='python', skiprows=1)
df = df.applymap(lambda x: 'NA' if x==None else x)
all_ = df.values.flatten()
genres = np.unique(all_)
for y in genres:
    tmp = df.applymap(lambda x: 1 if x==y else 0)
    print(y, tmp.values.flatten().sum())

代码，将文件读入数据帧，删除None值，查找数据帧中的所有唯一值，并计算数据帧中出现的次数。

从数据帧中获取由逗号分隔的值的计数

3 个答案: