我对nltk很新。
这允许我根据其词性标记句子。但是,在为其他语言执行此操作时会涉及哪些步骤?
package com.example.rssreader;
import android.animation.ObjectAnimator;
import android.content.Context;
import android.support.v7.widget.CardView;
import android.support.v7.widget.RecyclerView;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.ImageView;
import android.widget.TextView;
import com.bumptech.glide.Glide;
import com.daimajia.androidanimations.library.Techniques;
import com.daimajia.androidanimations.library.YoYo;
import com.squareup.picasso.Picasso;
import java.util.ArrayList;
/**
* Created by Efrain on 26-02-2016.
*/
public class MyAdapter extends RecyclerView.Adapter<MyAdapter.MyViewHolder> {
ArrayList<FeedItem>feedItems;
Context context;
public MyAdapter(Context context,ArrayList<FeedItem>feedItems){
this.feedItems=feedItems;
this.context=context;
}
@Override
public MyViewHolder onCreateViewHolder(ViewGroup parent, int viewType) {
View view= LayoutInflater.from(context).inflate(R.layout.custum_row_news_item,parent,false);
MyViewHolder holder=new MyViewHolder(view);
return holder;
}
@Override
public void onBindViewHolder(MyViewHolder holder, int position) {
YoYo.with(Techniques.FadeIn).playOn(holder.cardView);
FeedItem current=feedItems.get(position);
holder.Title.setText(current.getTitle());
holder.Description.setText(current.getDescription());
holder.Date.setText(current.getPubDate());
holder.Link.setText(current.getLink());
//the original String
String somestring = current.getLink();
//save the index of the string '=' since after that is were you find your number, remember to add one as the begin index is inclusive
int beginIndex = somestring.indexOf("=") + 1;
//if the number ends the string then save the length of the string as the end, you can change this index if that's not the case
int endIndex = somestring.length();
//Obtain the substring using the indexes you obtained (if the number ends the string you can ignore the second index, but i leave it here so you may use it if that's not the case)
String theNumber = somestring.substring(beginIndex,endIndex);
//printing the number for testing purposes
System.out.println("The number is: " + theNumber);
//Then create a new string with the data you want (I recommend using StringBuilder) with the first part of what you want
StringBuilder sb=new StringBuilder("http://shake.uprm.edu/~shake/archive/shake/");
// add the number
sb.append(theNumber);
//then the rest of the string
sb.append("/download/tvmap.jpg");
//Saving the String in a variable
String endResult = sb.toString();
//Verifying end result
System.out.println("The end result is: "+endResult);
Glide.with(context).load(endResult).into(holder.Thumbnail);
}
@Override
public int getItemCount() {
return feedItems.size();
}
public class MyViewHolder extends RecyclerView.ViewHolder {
TextView Title,Description,Date,Link;
ImageView Thumbnail;
CardView cardView;
public MyViewHolder(View itemView) {
super(itemView);
Title= (TextView) itemView.findViewById(R.id.title_text);
Description= (TextView) itemView.findViewById(R.id.description_text);
Date= (TextView) itemView.findViewById(R.id.date_text);
Thumbnail= (ImageView) itemView.findViewById(R.id.thumb_img);
cardView= (CardView) itemView.findViewById(R.id.cardview);
Link= (TextView) itemView.findViewById(R.id.info);
}
}
}
更新
我有兴趣从西班牙语开始。
更新2
import nltk
sentence = "I'm not sure!"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
产:
import nltk
from nltk.tokenize import word_tokenize
training_set = [[(w.lower(),t) for w,t in s] for s in nltk.corpus.conll2002.tagged_sents('esp.train')]
unigram_tagger = nltk.UnigramTagger(training_set)
bigram_tagger = nltk.BigramTagger(train_set, backoff=unigram_tagger)
tokens = [token.lower() for token in word_tokenize("El Congreso no podrá hacer ninguna ley con respecto al establecimiento de la religión, ni prohibiendo la libre práctica de la misma; ni limitando la libertad de expresión, ni de prensa; ni el derecho a la asamblea pacífica de las personas, ni de solicitar al gobierno una compensación de agravios.")]
答案 0 :(得分:2)
Afaik nltk没有为英语以外的任何语言准备好使用标记器或解析器。 nltk之外有这样的工具,你可以下载和使用它们。
nltk确实提供了培训您自己的西班牙语标记器的工具,使用西班牙语标记语料库之一作为培训材料。例如,您可以按照building a tagger的nltk说明操作,但使用conll2002.tagged_sents("esp.train")
作为训练数据。它只有大约250K字,所以你不会获得很好的表现,但它应该让你开始。 (当然,你可以找到一个更大的标记语料库来训练。)