Question

我有以下代码：

businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp',
'# Reviews', 'Yelp Rating Stars','BarRestStore','Category',
'Price Range','Alcohol','Ambience','Latitude','Longitude']

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata)
print '\n\nBusiness\n'
print business[:6]

它读取我的文件并创建一个我可以使用的Panda表。我需要的是计算＆＃39;类别＆＃39;的每一行中有多少类别。变量并将此数字存储在名为＆＃39;＃Categories＆＃39;的新列中。以下是目标列示例：

Category                                         
French                                               
Adult Entertainment , Lounges , Music Venues         
American (New) , Steakhouses                        
American (New) , Beer, Wine & Spirits , Gastropubs 
Chicken Wings , Sports Bars , American (New)         
Japanese

期望的输出：

Category                                        # Categories  
French                                               1           
Adult Entertainment , Lounges , Music Venues         3         
American (New) , Steakhouses                         2        
American (New) , Beer, Wine & Spirits , Gastropubs   4         
Chicken Wings , Sports Bars , American (New)         3         
Japanese                                             1

编辑1：

原始输入= CSV文件。目标栏：＆＃34;类别＆＃34; 我还无法发布截图。我不认为要计算的值是列表。

这是我的代码：

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True)
#business = pd.read_csv('FL_Yelp_Data_v2.csv')

business['Category'].str.split(',').apply(len)
#not sure where to declare the df part in the suggestions that use it.

print business[:6]

但我一直收到以下错误：

TypeError: object of type 'float' has no len()

编辑2：

我放弃了。感谢您的所有帮助，但我不得不想出其他的事情。

Answer 1

假设Category实际上是一个列表，您可以使用apply（根据@ EdChum的建议）：

business['# Categories'] = business.Category.apply(len)

如果没有，您首先需要解析它并将其转换为列表。

df['Category'] = df.Category.map(lambda x: [i.strip() for i in x.split(",")])

您能否正确显示该列的样本输出（包括正确的引用）？

P.S。 @EdChum感谢您的建议。我很感激他们。我相信列表理解方法可能更快，根据我用30k +行数据测试的一些文本数据样本：

%%timeit
df.Category.str.strip().str.split(',').apply(len)
10 loops, best of 3: 44.8 ms per loop

%%timeit
df.Category.map(lambda x: [i.strip() for i in x.split(",")])
10 loops, best of 3: 28.4 ms per loop

甚至考虑len函数调用：

%%timeit
df.Category.map(lambda x: len([i.strip() for i in x.split(",")]))
10 loops, best of 3: 30.3 ms per loop

Answer 2

这有效：

business['# Categories'] = business['Category'].apply(lambda x: len(x.split(',')))

如果你需要处理NA等，你可以传递一个更精细的函数而不是lambda。

Answer 3

business['Categories'] = business.Category.str.count(',')+1

Answer 4

使用pd.read_csv使输入更容易：

business = pd.read_csv('FL_Yelp_Data_v2.csv')

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

创建完成后，您可以创建一个函数，将“类别”列拆分为“，”，并计算结果列表的长度。使用lambda并申请。

Answer 5

You can do this...

GROUP BY

Answer 6

我有类似的疑问。我计算了每行中以逗号分隔的单词数。我通过以下方式解决了这个问题：

data ['Number_of_Categories'] = data ['Category']。apply（lambda x：len（str（x）.split（'，'）））

基本上我首先将每一行转换为字符串，因为Python将其识别为float，然后执行'len'函数。希望这有帮助

Answer 7

df['column_name'].apply(lambda n: \len(n.split(',')))

Answer 8

这可能是一个拼凑而成的解决方案，但我遇到了类似的问题并使用以下方法修复了它：

#Create an empty list to store your count in
numCategories=[]
#Create a loop to split each cell separately, then append to a list
i=0
while i <len(df):
#Switch out CategoriesColumnNumber in the below code for the correct column number
    temp_count = len(df.iloc[i,CategoriesColumnNumber].split(";"))
    numCategories.append(temp_count)
    i += 1
#Attach your newly generated list as a new column in your dataframe
df['#Categories'] = numCategories

不是最漂亮的解决方案，但希望它可以帮助一些刚刚入门的人！

如何在熊猫表的一列中计算逗号分隔值？

8 个答案: