如何打印特定的标记化实体

时间:2019-02-11 17:44:12

标签: python nltk

我的代码有问题。我有一个名为test.txt的.txt文件,其中包含句子,而我有一个数据框,其中包含这些句子中的所有标记化单词。我的问题是我想专门查找并打印出特定令牌并保留其位置编号。我尝试了一些if statement(),但似乎覆盖了每个单词的索引计数。

 public void createNotif(Poste poste, Comment comment, Boolean checkGroupSon) {

    if (comment == null) {
        emeteur = poste.getEmmet();
        message = poste.getTx();
        pathImg = poste.getImg();
    } else {
        emeteur = comment.getEmmet();
        message = comment.getComment();
        pathImg = comment.getImage();
    }
    //onDismiss Intent
    Intent intent = new Intent(mContext, NotificationControllerReceiver.class);
    PendingIntent mBroadcastIntentController = PendingIntent.getBroadcast(mContext, 0, intent, 0);


    Intent notificationIntent = new Intent(mContext, CommentPhotoActivity.class);
    notificationIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK); // Intent.FLAG_ACTIVITY_CLEAR_TASK|
    notificationIntent.putExtra("pid", poste.getId());


    PendingIntent contentIntent = PendingIntent.getActivity(mContext, 0, notificationIntent, PendingIntent.FLAG_UPDATE_CURRENT);

    //TEST    notification.setLatestEventInfo(getApplicationContext(), "YuYu", "Vous avez reçu un nouveau poste", contentIntent);
    //TEST  notification.flags = Notification.FLAG_AUTO_CANCEL;

    //TEST
    //    NotificationCompat.Builder notificationBuilder = new NotificationCompat.Builder(getApplicationContext()).setContentTitle("Poste: "+emeteur).setSmallIcon(R.drawable.ic_launcher).setContentIntent(contentIntent).setContentText(message).setDeleteIntent(mBroadcastIntentController).setPriority(Notification.PRIORITY_MAX);
    NotificationCompat.Builder notificationBuilder = null;
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {

        notificationBuilder = new NotificationCompat.Builder(mContext, "poste");


            if (pathImg == null)
                notificationBuilder.setContentTitle(emeteur).setSmallIcon(R.drawable.ic_launcher).setContentIntent(contentIntent).setContentText(emeteur + ": " + message).setDeleteIntent(mBroadcastIntentController).setPriority(NotificationCompat.PRIORITY_MAX);
            else
                notificationBuilder.setContentTitle(emeteur).setSmallIcon(R.drawable.ic_launcher).setContentIntent(contentIntent).setContentText(emeteur + ": Photo \uD83D\uDCF7 " + message).setDeleteIntent(mBroadcastIntentController).setPriority(NotificationCompat.PRIORITY_MAX);


        }

        //  notificationBuilder.setContentTitle("Poste: "+emeteur).setSmallIcon(R.drawable.ic_launcher).setContentIntent(contentIntent).setContentText(message).setDeleteIntent(mBroadcastIntentController).setPriority(Notification.PRIORITY_MAX);
    } 
    //  long[] v = {500,1000};
    // notificationBuilder.setVibrate(v);
    notifications = null;
    boolean b = notifHashMap.containsKey(poste.getId());
    if (b) {

        notifications = notifHashMap.get(poste.getId());
        // Add your All messages here or use Loop to generate messages
        if (pathImg == null)
            notifications.add(emeteur + ":" + message);
        else
            notifications.add(emeteur + ": Photo \uD83D\uDCF7 " + message);

        notifHashMap.put(poste.getId(), notifications);
    } else {
        notifications = new ArrayList<String>();
        // Add your All messages here or use Loop to generate messages
        if (pathImg == null)
            notifications.add(emeteur + ":" + message);
        else
            notifications.add(emeteur + ": Photo \uD83D\uDCF7 " + message);

        notifHashMap.put(poste.getId(), notifications);
    }


    //  if (inboxStyle == null)
    //    inboxStyle = new NotificationCompat.InboxStyle();
    //else
    inboxStyle = new NotificationCompat.InboxStyle(notificationBuilder);
    if (notifications.size() > 1) {
        if (poste.getImg() == null)
            inboxStyle.setBigContentTitle("Poste:" + poste.getTx());
        else
            inboxStyle.setBigContentTitle("Poste: \uD83D\uDCF7 " + poste.getTx());

        inboxStyle.setSummaryText("Vous avez " + notifications.size() + " notifications.");
        for (int i = 0; i < notifications.size(); i++) {
            inboxStyle.addLine(notifications.get(i));
        }
    } else if (notifications.size() == 1) {
        inboxStyle.addLine(notifications.get(0));
    }

    notificationBuilder.setStyle(inboxStyle);
    notificationBuilder.setVibrate(new long[]{0L});
    //notificationBuilder.setNumber(value++);
    if (!checkGroupSon) {
        Uri alarmSound = RingtoneManager.getDefaultUri(RingtoneManager.TYPE_NOTIFICATION);
        notificationBuilder.setSound(alarmSound);
    }


    NotifManager = (NotificationManager) mContext.getSystemService(Context.NOTIFICATION_SERVICE);
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
                                             /* Create or update. */
        mChannel = new NotificationChannel("poste", "YuYu", NotificationManager.IMPORTANCE_LOW);
        mChannel.enableLights(true);
        mChannel.setLightColor(Color.RED);
        mChannel.canShowBadge();
        mChannel.setVibrationPattern(new long[]{ 0 });
        mChannel.enableVibration(true);
      /*  if (!checkGroupSon) {
            mChannel.enableVibration(true);
            mChannel.setVibrationPattern(new long[]{100, 200, 300, 400, 500, 400, 300, 200, 400});
        } */
        assert NotifManager != null;
        notificationBuilder.setChannelId("poste");
       // NotifManager.deleteNotificationChannel("poste");
        NotifManager.createNotificationChannel(mChannel);
    }

    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
        summaryNotification = new NotificationCompat.Builder(mContext, "poste")
                .setSmallIcon(R.drawable.ic_launcher)
                .setStyle(new NotificationCompat.InboxStyle()
                        .addLine("1")
                        .addLine("2")
                        .setBigContentTitle("nouveau messages")
                        .setSummaryText("Nouveau message"))
                .setPriority(NotificationCompat.PRIORITY_LOW)
                .setGroup("example_group")
                .setGroupAlertBehavior(NotificationCompat.GROUP_ALERT_CHILDREN)
                .setGroupSummary(true).build();
    }

    Notification notification = notificationBuilder.setGroup("example_group").build();

    notification.flags = Notification.FLAG_AUTO_CANCEL;

    // assert  NotifPostMsgManager != null;
    //TEST   manager.notify(ID_NOTIFICATION, notification);
    NotifManager.notify(poste.getId(), notification);
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N)
        NotifManager.notify(1, summaryNotification);
}

我的test.txt文件中的示例句子是

def output ():
        currCount = 0
            for words in read():
                add = len(words)
                word_new = [' '.join(df.loc[t].values.tolist()) if t 
                in df.index is not None else t for t in word_tokenize(words)]
                tag = ' '.join(word for word in word_new);
                print('First:' + str(currCount) + '\n' + 'Last:' + str(currCount + add)
                + '\n' + 'Tag: ' + tag + '\n' + 'word: '+words + '\n')
                currCount += add + 1
                if words is ".":
                    currCount = 0

#Sample output                                       #Output that i want
#First:0                                             #Assume that i only want 
#Last:1                                              #PERSON tags
#Tag: PERSON                                         
#word: I                                             #First:0 
                                                     #Last:1
#First:2                                             #Tag: PERSON
#Last:6                                              #word: I
#Tag: NOTHING
#word: like                                          #First: 0  
                                                     #Last: 3
#First:7                                             #Tag: Bob
#Last:12                                             #word: PERSON
#Tag: FOOD
#word: pizza

#First:13
#Last:14
#Tag: NOTHING
#word: .

#First:0
#Last:3
#Tag: Bob
#word: PERSON

#First:4
#Last:9
#Tag: NOTHING
#word: likes

#First:10
#Last:15
#Tag: FOOD
#word: pizza

#First:16
#Last:17
#Tag: NOTHING
#word: .

和我制作的标签样本

   I like pizza .
   Bob likes pizza .
  I      PERSON 
  Like   NOTHING
  Pizza  FOOD
  .      NOTHING
  Bob    PERSON
  likes  NOTHING
  pizza  FOOD
  .      NOTHING

1 个答案:

答案 0 :(得分:0)

这可能是一种更清洁,更轻松的方法:

words = []
for line in data:
    start = 0
    for word in nltk.word_tokenize(line):
        word_tag = {}
        word_tag['First'] = start
        end = start + len(word)
        word_tag['Last'] = end
        word_tag['Word'] = word
        # word_tag['Tag'] = <your statement for tagging>
        words.append(word_tag)
        start = end + 1

df = pd.DataFrame(words)

这是您的数据框的外观:

First   Last    Word    Tag
0       1       I       PERSON
2       6       like    NOTHING
7       12      pizza   FOOD
13      14      .       NOTHING
0       3       Bob     PERSON
4       9       likes   NOTHING
10      15      pizza   FOOD
16      17      .       NOTHING

然后您可以使用以下方法过滤掉行:

df[df['Tag'] == 'PERSON']

输出:

First   Last    Word    Tag
0       1       I       PERSON
0       3       Bob     PERSON