我有大量数据(大约2万行),如下所示。
Caller1 5:30AM Mexico USA 2-22-19
Caller2 1:30AM Mexico USA 2-22-19
Caller3 2:30AM Mexico USA 2-22-19
Caller1 5:30AM Mexico USA 2-22-19
Caller5 3:30AM Mexico USA 2-22-19
Caller3 4:30AM Mexico USA 2-22-19
Caller2 5:30AM Mexico USA 2-22-19
Caller1 7:30AM Mexico USA 2-22-19
Caller12 9:39AM Mexico USA 2-22-19
Caller14 8:36AM Mexico USA 2-22-19
Caller15 2:39AM Mexico USA 2-22-19
Caller16 3:32AM Mexico USA 2-22-19
我正在寻找一种基于CallerID
分离数据的方法,如下所示:
Caller1 5:30AM Mexico USA 2-22-19
Caller1 5:30AM Mexico USA 2-22-19
Caller1 7:30AM Mexico USA 2-22-19
---------------------------------
Caller2 1:30AM Mexico USA 2-22-19
Caller2 5:30AM Mexico USA 2-22-1
---------------------------------
.
.
我起初曾经将这些数据存储为dictionary
,然后将任何新数据添加到该字典中
由于初始参数CallerID
也是可变的,我在分离时遇到了麻烦。
我的代码:
>>> input = [('caller1', 'data....'),('caller2','data,,,,,)
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
我不能使用它,因为数据集太大
Python中是否有任何软件包可以根据句子的第一个单词隔离数据?
答案 0 :(得分:0)
您可以尝试这种方法,将数据存储在list字典中,其中key为要与之分组的字符串,即Caller1,Caller2等。
data = ["Caller1 5:30AM Mexico USA 2-22-19",
"Caller2 1:30AM Mexico USA 2-22-19",
"Caller3 2:30AM Mexico USA 2-22-19",
"Caller1 5:30AM Mexico USA 2-22-19",
"Caller5 3:30AM Mexico USA 2-22-19",
"Caller3 4:30AM Mexico USA 2-22-19",
"Caller2 5:30AM Mexico USA 2-22-19",
"Caller1 7:30AM Mexico USA 2-22-19",
"Caller12 9:39AM Mexico USA 2-22-19",
"Caller14 8:36AM Mexico USA 2-22-19",
"Caller15 2:39AM Mexico USA 2-22-19",
"Caller16 3:32AM Mexico USA 2-22-19"]
grouped_data = {}
# ITERATE THE INPUT AND STORE DATA WITH KEY IN DICTIONARY OF LIST
for x in data:
temp: list = []
key = x.split(' ')[0]
if key in grouped_data:
temp = grouped_data.get(key)
temp.append(x)
grouped_data[key] = temp
# PRINT THE DATA AS GROUPED
for k, v in grouped_data.items():
print(f"data for {k}")
for d in v:
print(d)