我有以下代码段,这些代码用于创建数据框并将值插入其中。
导入字符串
来自numpy import nan
从熊猫导入DataFrame
NRFINAL = ['I am a boy', 'He is a boy', 'ram is a boy']
TERM_COLUMN = []
#SENTENCE_ROW = []
for i in NRFINAL:
for j in i.split():
if j not in TERM_COLUMN:
TERM_COLUMN.append(j)
FREQUENCY = {}
DF = DataFrame(index= [i for i in NRFINAL], columns=TERM_COLUMN)
for index, row in DF.iterrows():
for j in index.split():
for k in TERM_COLUMN:
if j == k:
count = FREQUENCY.get(k, 0)
FREQUENCY[k] = count + 1
DF.set_value(index, k, FREQUENCY[k])
FREQUENCY.clear()
DF.replace(nan, 0, inplace=True) # To replace nan value in dataframe cell
DF = DF.loc[~DF.apply(lambda row: (row == 0).all(), axis=1)]
MATRIX = DF.values.tolist() #dataframe to list
print(MATRIX)
我得到一个空矩阵作为输出,如下所示。
[]
但是当我将相同的代码替换为:
import string
from numpy import nan
from pandas import DataFrame
NRFINAL = ['I am a boy', 'He is a boy', 'ram is a boy']
TERM_COLUMN = []
SENTENCE_ROW = []
for i in NRFINAL:
SENTENCE_ROW.append(i)
for j in i.split():
if j not in TERM_COLUMN:
TERM_COLUMN.append(j)
print(TERM_COLUMN)
FREQUENCY = {}
DF = DataFrame(index= (SENTENCE_ROW), columns=TERM_COLUMN)
for index, row in DF.iterrows():
for j in index.split():
for k in TERM_COLUMN:
if j == k:
count = FREQUENCY.get(k, 0)
FREQUENCY[k] = count + 1
DF.set_value(index, k, FREQUENCY[k])
FREQUENCY.clear()
DF.replace(nan, 0, inplace=True) # To replace nan value in dataframe cell
DF = DF.loc[~DF.apply(lambda row: (row == 0).all(), axis=1)]
MATRIX = DF.values.tolist() #dataframe to list
print(MATRIX)
我得到所需的输出,如下所示:
[[1, 1, 1, 1, 0, 0, 0], [0, 0, 1, 1, 1, 1, 0], [0, 0, 1, 1, 0, 1, 1]]
上面的代码有什么问题。另外,有没有一种方法可以优化上面的代码?
答案 0 :(得分:1)
不太确定为什么第一组和第二组代码的行为不同,因为当我尝试它们时,它们都给出相同的结果。 请检查下面的代码行数。
from collections import Counter
from pandas import DataFrame
NRFINAL = ['I am a boy', 'He is a boy', 'ram is a boy']
TERM_COLUMN = list(set(' '.join(NRFINAL).split()))
print(TERM_COLUMN)
DF = DataFrame(index= (NRFINAL), columns=TERM_COLUMN)
for index, row in DF.iterrows():
for k in TERM_COLUMN:
DF.set_value(index, k,(Counter(index.split()))[k])
MATRIX = DF.values.tolist()
print(MATRIX)
print(DF)