我有代码:
from sklearn.feature_extraction.text import TfidfVectorizer
titles = open("user1_titles.txt",'r')
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(titles)
idf = vectorizer.idf_
print(dict(zip(vectorizer.get_feature_names(), idf)), file = open("user1_tf.csv",'a'))
但这给了我以下输出:
{'00': 7.8987145343299883, '007': 9.6034626265684135, '01': 9.6034626265684135, '012': 9.197997518460248, '01273': 9.6034626265684135, '02': 9.6034626265684135, '020': 9.6034626265684135, '026514': 9.6034626265684135,... etc
我需要的输出是:
00 7.8987145343299883
007 9.6034626265684135
etc.
我的目标是从输出中删除大括号{},仅包含2列数据;名称和值
答案 0 :(得分:0)
使用pprint-一种“数据漂亮的打印机”:
from pprint import pprint
d = {'00': 7.8987145343299883, '007': 9.6034626265684135, '01': 9.6034626265684135, '012': 9.197997518460248, '01273': 9.6034626265684135, '02': 9.6034626265684135, '020': 9.6034626265684135, '026514': 9.6034626265684135}
pprint(d)
输出:
{'00': 7.898714534329988,
'007': 9.603462626568414,
'01': 9.603462626568414,
'012': 9.197997518460248,
'01273': 9.603462626568414,
'02': 9.603462626568414,
'020': 9.603462626568414,
'026514': 9.603462626568414}
或使用format
手工制作的解决方案:
for key, value in d.items():
print( '{:>6} {}'.format(key, value) )
结果:
026514 9.603462626568414
012 9.197997518460248
01 9.603462626568414
00 7.898714534329988
020 9.603462626568414
007 9.603462626568414
02 9.603462626568414
01273 9.603462626568414
答案 1 :(得分:0)
您可以按照以下方式进行操作
for key,value in (dict(zip(vectorizer.get_feature_names(), idf)), file = open("user1_tf.csv",'a')).iteritems()::
print key,value
或 您可以在某些变量中收集第一个语句的输出,然后打印出来 喜欢:
data = print(dict(zip(vectorizer.get_feature_names(), idf)), file = open("user1_tf.csv",'a'))
for key,value in data.iteritems():
print key,value
答案 2 :(得分:0)
这与Saurabh的答案基本相同,但会打印出值。
def splitPrint(data):
for key,value in data.items():
print(key, value)