我有一排blurb(以文本格式),我想使用tf-idf定义每个单词的权重。下面是代码:
def remove_punctuations(text):
for punctuation in string.punctuation:
text = text.replace(punctuation, '')
return text
df["punc_blurb"] = df["blurb"].apply(remove_punctuations)
df = pd.DataFrame(df["punc_blurb"])
vectoriser = TfidfVectorizer()
df["blurb_Vect"] = list(vectoriser.fit_transform(df["punc_blurb"]).toarray())
df_vectoriser = pd.DataFrame(x.toarray(),
columns = vectoriser.get_feature_names())
print(df_vectoriser)
我得到的只是一个庞大的数字列表,我什至无法确定它是否给我的TF或TF-IDF作为常用词(the和and等)的得分都更高大于0。
目标是在下面显示的tf-idf列中查看权重,我不确定是否以最有效的方式进行此操作:
答案 0 :(得分:0)
如果使用<template>
<script>
methods: {
reset() {
this.storeName = '';
// These codes below are exactly the same with getStores() function..
let params = {
ap_dates: [
this.date,
moment(this.date).add(1, 'days').format('YYYY-MM-DD'),
],
ap: ['orderCount', 'orderHasReceiptCount'],
};
if (this.storeName.length > 0) {
params.f = ['name,%,' + this.storeName];
}
this.$http
.get(`${this.$store.getters.apiPath}/store`, { params: params })
.then((res) => {
// console.log(res);
this.stores = res.data.data;
for (const key in this.total) {
this.total[key] = 0;
if (res.data.meta.totalOrderCount.hasOwnProperty(key)) {
this.total[key] = res.data.meta.totalOrderCount[key];
}
}
this.$forceUpdate();
})
.catch((err) => console.error(err))
.finally(() => {
this.loading = false;
this.init = true;
});
},
getStores() {
this.loading = true;
let params = {
ap_dates: [
this.date,
moment(this.date).add(1, 'days').format('YYYY-MM-DD'),
],
ap: ['orderCount', 'orderHasReceiptCount'],
};
if (this.storeName.length > 0) {
params.f = ['name,%,' + this.storeName];
}
this.$http
.get(`${this.$store.getters.apiPath}/store`, { params: params })
.then((res) => {
this.stores = res.data.data;
for (const key in this.total) {
this.total[key] = 0;
if (res.data.meta.totalOrderCount.hasOwnProperty(key)) {
this.total[key] = res.data.meta.totalOrderCount[key];
}
}
this.$forceUpdate();
})
.catch((err) => console.error(err))
.finally(() => {
this.loading = false;
this.init = true;
});
},
}
</script>
</template>
,则不需要标点删除器。借助默认的TypeError: Illegal invocation
at HTMLDocument.document.createEvent(:1:40918)
at Wc.l(/gpt/pubads_impl_2020101501.js:6:89381)
at $c(/gpt/pubads_impl_2020101501.js:6:15725)
at bd.next(/gpt/pubads_impl_2020101501.js:6:16017)
at ? (/gpt/pubads_impl_2020101501.js:6:29861)
at new Promise(<anonymous>)
at Xa(/gpt/pubads_impl_2020101501.js:6:29629)
at Po.jn.dispatchEvent(/gpt/pubads_impl_2020101501.js:6:89295)
at cz._.q.Yb(/gpt/pubads_impl_2020101501.js:6:262763)
at Object.<anonymous>(/gpt/pubads_impl_2020101501.js:6:120833)
参数,它将自动处理标点符号:
TfidfVectorizer