如何将tf-idf应用于文本行

时间:2020-10-23 08:16:05

标签: python machine-learning scikit-learn nlp tf-idf

我有一排blurb(以文本格式),我想使用tf-idf定义每个单词的权重。下面是代码:

def remove_punctuations(text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
    return text
df["punc_blurb"] = df["blurb"].apply(remove_punctuations)

df = pd.DataFrame(df["punc_blurb"])

vectoriser = TfidfVectorizer()
df["blurb_Vect"] = list(vectoriser.fit_transform(df["punc_blurb"]).toarray())

df_vectoriser = pd.DataFrame(x.toarray(),
columns = vectoriser.get_feature_names())
print(df_vectoriser)

我得到的只是一个庞大的数字列表,我什至无法确定它是否给我的TF或TF-IDF作为常用词(the和and等)的得分都更高大于0。

目标是在下面显示的tf-idf列中查看权重,我不确定是否以最有效的方式进行此操作:

Goal Output table

1 个答案:

答案 0 :(得分:0)

如果使用<template> <script> methods: { reset() { this.storeName = ''; // These codes below are exactly the same with getStores() function.. let params = { ap_dates: [ this.date, moment(this.date).add(1, 'days').format('YYYY-MM-DD'), ], ap: ['orderCount', 'orderHasReceiptCount'], }; if (this.storeName.length > 0) { params.f = ['name,%,' + this.storeName]; } this.$http .get(`${this.$store.getters.apiPath}/store`, { params: params }) .then((res) => { // console.log(res); this.stores = res.data.data; for (const key in this.total) { this.total[key] = 0; if (res.data.meta.totalOrderCount.hasOwnProperty(key)) { this.total[key] = res.data.meta.totalOrderCount[key]; } } this.$forceUpdate(); }) .catch((err) => console.error(err)) .finally(() => { this.loading = false; this.init = true; }); }, getStores() { this.loading = true; let params = { ap_dates: [ this.date, moment(this.date).add(1, 'days').format('YYYY-MM-DD'), ], ap: ['orderCount', 'orderHasReceiptCount'], }; if (this.storeName.length > 0) { params.f = ['name,%,' + this.storeName]; } this.$http .get(`${this.$store.getters.apiPath}/store`, { params: params }) .then((res) => { this.stores = res.data.data; for (const key in this.total) { this.total[key] = 0; if (res.data.meta.totalOrderCount.hasOwnProperty(key)) { this.total[key] = res.data.meta.totalOrderCount[key]; } } this.$forceUpdate(); }) .catch((err) => console.error(err)) .finally(() => { this.loading = false; this.init = true; }); }, } </script> </template> ,则不需要标点删除器。借助默认的TypeError: Illegal invocation at HTMLDocument.document.createEvent(:1:40918) at Wc.l(/gpt/pubads_impl_2020101501.js:6:89381) at $c(/gpt/pubads_impl_2020101501.js:6:15725) at bd.next(/gpt/pubads_impl_2020101501.js:6:16017) at ? (/gpt/pubads_impl_2020101501.js:6:29861) at new Promise(<anonymous>) at Xa(/gpt/pubads_impl_2020101501.js:6:29629) at Po.jn.dispatchEvent(/gpt/pubads_impl_2020101501.js:6:89295) at cz._.q.Yb(/gpt/pubads_impl_2020101501.js:6:262763) at Object.<anonymous>(/gpt/pubads_impl_2020101501.js:6:120833) 参数,它将自动处理标点符号:

TfidfVectorizer