Question

I have a question regarding the structure of my code. I have the following csv

name product country
 A     game1   USA
 A     game2   USA 
 B     bis     World
 .
 .

Basically, the name of each vendor appears multiple times (as many as the number of products the vendor has). My purpose is to create a csv which contains the name of the vendor, number of products and the country(if the value is "world" I will assign 5 or else 1). So far I have not managed to do using a more algorithmic mindset. Instead I have used the next code

df = pd.read_csv("testtest.csv") 

num_listings = df['vendor_name'].value_counts().to_dict()

print(num_listings)

and then I converted the dictionary to a csv file. I assume that using a for loop could make my code easier since I could use a counter and as long as the name remains the same just use that counter. I do not know how should i approach it. I already tried the following but it did not work.

ds = pd.read_csv("testtest.csv", index_col = 'vendor_name') 

x=0
for index in ds:
  if ds['index'] == ds['index']:
    x=x+1
print(x)

Any help?

Answer 1

Use groupby.agg with a dictionary of aggregation functions for each column.

import pandas as pd

d = {'product': pd.Series.nunique,
     'country': lambda x: 5 if (x=='World').any() else 1}
df.groupby('name').agg(d).reset_index()

Output:

  name  product  country
0    A        2        1
1    B        1        5

Alternative way of finding the frequency of a term and process the respective values

1 个答案:

Output: