计算系列中值的出现次数

时间:2016-01-25 08:27:57

标签: python pandas

我是Python的新手。我正在读一个csv文件(有3列:lib,imports,import_tuples)。如何计算列中第二列中出现的值的出现次数?

示例:

lib,imports,import_tuples 
lib1,"[0, 1, 2, 3, 4, 5]","[[5, 1, 2], [2,1,3],[2, 4, 1]]" 
lib2,"[4, 65, 99, 100]","[[4, 65, 100], [100, 4],[99, 65]]"

Expected Output (for lib1)
0 1 2 3 4 5
0 3 3 1 1 1 
import pandas
from collections import Counter
df = pandas.read_csv('temp_data.csv')
myList = second.values.T.tolist()

c = df["import_tuples"].str.split(',').apply(Counter)
data = pandas.DataFrame({n: c.apply(lambda x: x.get(n, 0)) for n in myList})
data =  c.to_frame()

1 个答案:

答案 0 :(得分:0)

您可以使用pandas.Series.str.findall()strings中提取数字,然后使用collections.Counter

from collections import Counter
df['imports'] = df.imports.str.findall(r'\d+')
df['import_tuples'] = df.import_tuples.str.findall(r'\d+')
df['imports_counted'] = df.apply(lambda x: {i: Counter(x.import_tuples).get(i) for i in x.imports}, axis=1)

    lib             imports                 import_tuples  \
0  lib1  [0, 1, 2, 3, 4, 5]   [5, 1, 2, 2, 1, 3, 2, 4, 1]   
1  lib2    [4, 65, 99, 100]  [4, 65, 100, 100, 4, 99, 65]   

                                     imports_counted  
0  {'2': 3, '5': 1, '0': None, '3': 1, '4': 1, '1...  
1               {'99': 1, '4': 2, '100': 2, '65': 2}