我有两个列表,并且我试图创建一个矩阵(或数据框),该矩阵的数量是list2的值在list1的每个子列表中的次数:
list1 = [['texas','california','illinois'],['illinois','montana'],['new york','iowa'],['florida'],['north carolina']]
list2 = ['california','illinois','maine','oregon','wisconsin','florida']
count = 0
countx = 0
i = 0
for item in list1:
while i < len(list2):
x = list1[count].count(list2[countx])
print(list2[countx],x)
countx = countx + 1
i = i + 1
输出:
california 1
illinois 1
maine 0
oregon 0
wisconsin 0
florida 0
上面的代码在第一个子列表中循环并打印输出。在确定list2也被循环通过的同时,我不确定如何使其移至下一个子列表。
我的最终目标是拥有一个矩阵,该矩阵的每个子列表在左侧,列标题为列表2。
california illinois maine oregon wisconsin florida
['texas','california','illinois'] 1 1 0 0 0 0
['illinois','montana'] 0 1 0 0 0 0
etc.
答案 0 :(得分:1)
使用pandas.Series.str.contains
:
s = pd.Series(list1)
df = pd.DataFrame({k: s.str.contains(k, regex=False)
for k in list2},
dtype=int).set_index(s)
print(df)
输出:
california illinois maine oregon wisconsin \
[texas, california, illinois] 1 1 0 0 0
[illinois, montana] 0 1 0 0 0
[new york, iowa] 0 0 0 0 0
[florida] 0 0 0 0 0
[north carolina] 0 0 0 0 0
florida
[texas, california, illinois] 0
[illinois, montana] 0
[new york, iowa] 0
[florida] 1
[north carolina] 0
答案 1 :(得分:1)
虽然不清楚将列表作为索引值的含义是什么,但是下面的代码很简单,可以完全满足您的要求:
import pandas as pd
list1 = [['texas', 'california', 'illinois'],
['illinois', 'montana'],
['new york', 'iowa'],
['florida'],
['north carolina']]
list2 = ['california', 'illinois', 'maine', 'oregon', 'wisconsin', 'florida']
df = pd.DataFrame()
for x1 in list1:
df = df.append(pd.DataFrame([[x1, *[x1.count(x2) for x2 in list2]]], columns=['index', *list2]).set_index('index'))
print(df)
结果:
california illinois ... wisconsin florida
index ...
[texas, california, illinois] 1 1 ... 0 0
[illinois, montana] 0 1 ... 0 0
[new york, iowa] 0 0 ... 0 0
[florida] 0 0 ... 0 1
[north carolina] 0 0 ... 0 0
[5 rows x 6 columns]
答案 2 :(得分:0)
这里实现起来很快,但是时间却很慢。
for item2 in list2:
count = 0
for l in list1:
for item in l:
if item == item2:
count += 1
print(item2, count)
我意识到这不会创建矩阵。