我有一个数据框' genres',其中列中每行的值由','分隔。我需要计算每个值,如喜剧2,戏剧7等等。尝试了许多坚果失败的方法。
我尝试genres = trending.groupby(['genre']).size()
,但此行将值'Comedy,Crime,CriticallyAcclaimed'
视为一个。我是python的新手,请帮助我。
genre
Comedy,Crime,CriticallyAcclaimed
Comedy,Drama,Romance
Drama
Drama
Drama,Hollywood
Drama,Romance
Drama,Romance
Drama,Romance,Classic
答案 0 :(得分:1)
我得到了答案:
genres = pd.DataFrame(genres.genre.str.split(',', expand=True).stack(), columns= ['genre'])
genres = genres.reset_index(drop = True)
genre_count = pd.DataFrame(genres.groupby(by = ['genre']).size(),columns = ['count'])
genre_count = genre_count.reset_index()
答案 1 :(得分:0)
如果您正在使用pandas
,可以猜到OP中甚至没有说过的内容,您可以做类似的事情:
from collections import Counter
// Code where you get trending variable
genreCount = Counter()
for row in trending.itertuples():
genreCount.update(row[0].split(",")) // Change the 0 for the position where the genre column is
print(genreCount) // It works as a dict where keys are the genres and values the appearances
print(dict(genreCount)) // You can also turn it inot a dict but the Counter variable already works as one
答案 2 :(得分:0)
以下代码假设您已知道一行中的最大项目数。这意味着您需要读取一次文件并找到此信息(此处我们假设此数字基于您的示例为3)。
max_num_of_items_in_one_row = 3
cols = range(max_num_of_items_in_one_row)
df = pd.read_csv('genre.txt', names=cols, engine='python', skiprows=1)
df = df.applymap(lambda x: 'NA' if x==None else x)
all_ = df.values.flatten()
genres = np.unique(all_)
for y in genres:
tmp = df.applymap(lambda x: 1 if x==y else 0)
print(y, tmp.values.flatten().sum())
代码,将文件读入数据帧,删除None值,查找数据帧中的所有唯一值,并计算数据帧中出现的次数。