我是Python的新手。我正在读一个csv文件(有3列:lib,imports,import_tuples)。如何计算列中第二列中出现的值的出现次数?
示例:
lib,imports,import_tuples
lib1,"[0, 1, 2, 3, 4, 5]","[[5, 1, 2], [2,1,3],[2, 4, 1]]"
lib2,"[4, 65, 99, 100]","[[4, 65, 100], [100, 4],[99, 65]]"
Expected Output (for lib1)
0 1 2 3 4 5
0 3 3 1 1 1
import pandas
from collections import Counter
df = pandas.read_csv('temp_data.csv')
myList = second.values.T.tolist()
c = df["import_tuples"].str.split(',').apply(Counter)
data = pandas.DataFrame({n: c.apply(lambda x: x.get(n, 0)) for n in myList})
data = c.to_frame()
答案 0 :(得分:0)
您可以使用pandas.Series.str.findall()
从strings
中提取数字,然后使用collections.Counter
:
from collections import Counter
df['imports'] = df.imports.str.findall(r'\d+')
df['import_tuples'] = df.import_tuples.str.findall(r'\d+')
df['imports_counted'] = df.apply(lambda x: {i: Counter(x.import_tuples).get(i) for i in x.imports}, axis=1)
lib imports import_tuples \
0 lib1 [0, 1, 2, 3, 4, 5] [5, 1, 2, 2, 1, 3, 2, 4, 1]
1 lib2 [4, 65, 99, 100] [4, 65, 100, 100, 4, 99, 65]
imports_counted
0 {'2': 3, '5': 1, '0': None, '3': 1, '4': 1, '1...
1 {'99': 1, '4': 2, '100': 2, '65': 2}