我试图根据感兴趣的单词子集计算单词列中单词存在的次数。
首先我导入我的数据
products = graphlab.SFrame('amazon_baby.gl/')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
products.head(5)
可以在此处找到数据:https://drive.google.com/open?id=0BzbhZp-qIglxM3VSVWRsVFRhTWc
然后我创建了我感兴趣的单词列表:
words = ['awesome', 'great', 'fantastic']
我想计算产品['word_count']中“单词”中每个单词出现的次数。
我没有使用graphlab结婚。这只是一位同事向我建议的。
答案 0 :(得分:1)
好吧,我不太确定'在词典栏中'你的意思。 如果是列表:
import operator
dictionary={'texts':['red blue blue','red black','blue white white','red','white','black','blue red']}
words=['red','white','blue']
freqs=dict()
for t in dictionary['texts']:
for w in words:
try:
freqs[w]+=t.count(w)
except:
freqs[w]=t.count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True)
如果只是一个文字:
import operator
dictionary={'text':'red blue blue red black blue white white red white black blue red'}
words=['red','white','blue']
freqs=dict()
for w in words:
try:
freqs[w]+=dictionary['text'].count(w)
except:
freqs[w]=dictionary['text'].count(w)
top_words = sorted(freqs.items(), key=operator.itemgetter(1),reverse=True)
答案 1 :(得分:1)
如果您想计算单词的出现次数,快速执行此操作的方法是使用collections
中的In [3]: from collections import Counter
In [4]: c = Counter(['hello', 'world'])
In [5]: c
Out[5]: Counter({'hello': 1, 'world': 1})
对象
例如:
products.head(5)
您能否显示public class Program
{
public static void Main()
{
var udp = new Udp("255.255.255.255", 1337);
Task.Run(() =>
{
while (true)
{
Console.WriteLine(udp.Receive());
}
});
Task.Run(() =>
{
while (true)
{
Thread.Sleep(1000);
udp.Send("(((1)))");
}
});
Console.ReadLine();
}
}
public class Udp
{
private readonly UdpClient _sender;
private readonly UdpClient _listener;
public Udp(string address, int port)
{
_sender = new UdpClient(address, port);
_listener = new UdpClient();
_listener.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
_listener.Client.Bind(new IPEndPoint(IPAddress.Any, port));
}
public string Receive()
{
var _ = null as IPEndPoint;
return $"{Encoding.Default.GetString(_listener.Receive(ref _))} from {_.Address}:{_.Port}";
}
public void Send(string message)
{
var dataAsBytes = Encoding.ASCII.GetBytes(message);
_sender.Send(dataAsBytes, dataAsBytes.Length);
}
}
命令的输出?
答案 2 :(得分:1)
如果坚持使用graphlab(或SFrame),请使用SArray.dict_trim_by_keys
方法。文档在这里:https://dato.com/products/create/docs/generated/graphlab.SArray.dict_trim_by_keys.html
import graphlab as gl
sf = gl.SFrame({'review': ['what a good book', 'terrible book']})
sf['word_bag'] = gl.text_analytics.count_words(sf['review'])
keywords = ['good', 'book']
sf['key_words'] = sf['word_bag'].dict_trim_by_keys(keywords, exclude=False)
print sf
+------------------+---------------------+---------------------+
| review | word_bag | key_words |
+------------------+---------------------+---------------------+
| what a good book | {'a': 1, 'good':... | {'good': 1, 'boo... |
| terrible book | {'book': 1, 'ter... | {'book': 1} |
+------------------+---------------------+---------------------+
[2 rows x 3 columns]
答案 3 :(得分:0)
是否要将每个计数放在单独的列中? 在这种情况下,这可能会起作用:
keywords = ['keyword1' , 'keyword2']
def word_counter(dict_cell , word):
if word in dict_cell:
return dict_cell[word]
else:
return 0
for words in keywords:
df[words] = df['word_count'].apply(lambda x:word_counter(x,words))
答案 4 :(得分:0)
def count_words(x, w):
if w in x:
return x.count(w)
else:
return 0
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
for words in selected_words:
products[words]=products['review'].apply(lambda x:count_words(x,words))