我有这段代码,我一直在努力优化。
我的数据框是一个包含2列的csv文件,其中第二列包含文本。看起来像图片:
我有一个函数汇总(text,n),它需要一个文本和一个整数作为输入。
Unhandled Server Error (Oops!) +11ms { Invariant Violation: _registerComponent(...): Target container is not a DOM element.
at invariant (/Users/biel/workspace/sonder/client/node_modules/fbjs/lib/invariant.js:44:15)
at Object._renderNewRootComponent (/Users/biel/workspace/sonder/client/node_modules/react-dom/lib/ReactMount.js:310:76)
at Object._renderSubtreeIntoContainer (/Users/biel/workspace/sonder/client/node_modules/react-dom/lib/ReactMount.js:401:32)
at Object.render (/Users/biel/workspace/sonder/client/node_modules/react-dom/lib/ReactMount.js:422:23)
at callee$1$0$ (/Users/biel/workspace/sonder/client/src/server.js:141:18)
at tryCatch (/Users/biel/workspace/sonder/client/node_modules/regenerator/runtime.js:61:40)
at GeneratorFunctionPrototype.invoke [as _invoke] (/Users/biel/workspace/sonder/client/node_modules/regenerator/runtime.js:328:22)
at GeneratorFunctionPrototype.prototype.(anonymous function) [as next] (/Users/biel/workspace/sonder/client/node_modules/regenerator/runtime.js:94:21)
at GeneratorFunctionPrototype.invoke (/Users/biel/workspace/sonder/client/node_modules/regenerator/runtime.js:136:37) name: 'Invariant Violation', framesToPop: 1 }
总结()所有文本,我首先遍历我的数据框并创建所有文本的列表,然后我再次迭代将它们逐个发送到summarize()函数,这样我就可以得到摘要文本。这些for循环使我的代码变得非常非常慢,但我还没有找到一种方法来提高它的效率,我非常感谢任何建议。
function getMin(arr){
if(arr.length == 0)return undefined;
var min = arr[0];
for(var i =0; i < arr.length; i++){
if(Math.abs(arr[i]) < Math.abs(min)){
min = arr[i];
}
else if(Math.abs(arr[i]) == Math.abs(min) && arr[i] > 0)
min = arr[i]
}
return min
}
编辑: 其他两个功能是:
def summarize(text, n):
sents = sent_tokenize(text) # text into tokenized sentences
# Checking if there are less sentences in the given review than the required length of the summary
assert n <= len(sents)
list_sentences = [word_tokenize(s.lower()) for s in sents] # word tokenized sentences
frequency = calculate_freq(list_sentences) # calculating the word frequency for all the sentences
ranking = defaultdict(int)
for i, sent in enumerate(list_sentences):
for w in sent:
if w in frequency:
ranking[i] += frequency[w]
# Calling the rank function to get the highest ranking
sents_idx = rank(ranking, n)
# Return the best choices
return [sents[j] for j in sents_idx]
输入文字:食谱很容易,狗喜欢它们。我会一次又一次地买这本书。唯一的问题是食谱不会告诉你他们制作了多少款,但我认为这是因为你可以制作各种不同尺寸的食谱。太棒了! 输出文字:我会一次又一次地买这本书。
答案 0 :(得分:1)
你尝试过这样的事吗?
# Test data
df = pd.DataFrame({'ASIN': [0,1], 'Summary': ['This is the first text', 'Second text']})
# Example function
def summarize(text, n=5):
"""A very basic summary"""
return (text[:n] + '..') if len(text) > n else text
# Applying the function to the text
df['Result'] = df['Summary'].map(summarize)
# ASIN Summary Result
# 0 0 This is the first text This ..
# 1 1 Second text Secon..
答案 1 :(得分:0)
这么长的故事......
我将假设您正在执行文本频率分析,reviewText
的顺序无关紧要。如果是这样的话:
Mega_String = ' '.join(data['reviewText'])
这应该将评论文本功能中的所有字符串连接成一个大字符串,每个评论用空格分隔。
您可以将此结果抛给您的函数。