在Java中获取'trigrams'

时间:2016-02-25 14:19:52

标签: java analysis n-gram trigram

我在Java中获得trigrams时遇到了一些问题。我的程序目前可以bigrams正常但是当我尝试实现方法的相同结构并将其更改为获取trigrams时,它似乎也无法正常工作。 我希望trigrams能够获得arraylist中所有可能的单词组合,例如

Original = [eye, test, find, free, nhs]
Trigram = [eye test find, 2, eye test free, 3, eye test nhs, 4, eye find free, 3, eye find nhs, 4, eye free nhs, 5, etc...]

这些数字决定了第一个单词和最后一个单词之间的距离,并且应该得到arraylist中3的每个单词组合。这目前适用于bigrams ...

Original = [eye, test, find, free, nhs]
Bigram = [eye test, 1, eye find, 2, eye free, 3, eye nhs, 4, test find, 1, test free, 2, test nhs, 3, find free, 1, etc..]

以下是方法

public ArrayList<String> bagOfWords;
public ArrayList<String> bigramList = new ArrayList<String>();
public ArrayList<String> trigramList = new ArrayList<String>();


public void trigram() throws FileNotFoundException{
    PrintWriter tg = new PrintWriter(new File(trigramFile));
    // CREATES THE TRIGRAM
    for (int i = 0; i < bagOfWords.size() - 1; i++) {
        for (int j = 1; j < bagOfWords.size() - 1; j++) {
            for(int k = j + 1; k < bagOfWords.size(); k++){
                int distance = (k - i);
                if (distance < 4){
                    trigramList.add(bagOfWords.get(i) + " " + bagOfWords.get(j) + " " + bagOfWords.get(k) + ", " + distance);
                }
            }
        }
    }


public void bigram() throws FileNotFoundException{
    // CREATES THE BIGRAM
    PrintWriter bg = new PrintWriter(new File(bigramFile));
    for (int i = 0; i < bagOfWords.size() - 1; i++) {
        for (int j = i + 1; j < bagOfWords.size(); j++) {
            int distance = (j - i);
            if (distance < 4){
                bigramList.add(bagOfWords.get(i) + " " + bagOfWords.get(j) + ", " + distance);
            }
        }
    }

任何人都可以帮我改变trigram()方法,为我需要的东西创建合适的trigram吗? 谢谢你的帮助。

2 个答案:

答案 0 :(得分:2)

您希望""" Order a block storage (performance ISCSI). Important manual pages: http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/verifyOrder http://sldn.softlayer.com/reference/services/SoftLayer_Product_Order/placeOrder http://sldn.softlayer.com/reference/services/SoftLayer_Product_Package http://sldn.softlayer.com/reference/services/SoftLayer_Product_Package/getItems http://sldn.softlayer.com/reference/services/SoftLayer_Location http://sldn.softlayer.com/reference/services/SoftLayer_Location/getDatacenters http://sldn.softlayer.com/reference/services/SoftLayer_Network_Storage_Iscsi_OS_Type http://sldn.softlayer.com/reference/services/SoftLayer_Network_Storage_Iscsi_OS_Type/getAllObjects http://sldn.softlayer.com/reference/datatypes/SoftLayer_Location http://sldn.softlayer.com/reference/datatypes/SoftLayer_Container_Product_Order_Network_Storage_Enterprise http://sldn.softlayer.com/reference/datatypes/SoftLayer_Product_Item_Price http://sldn.softlayer.com/blog/cmporter/Location-based-Pricing-and-You http://sldn.softlayer.com/blog/bpotter/Going-Further-SoftLayer-API-Python-Client-Part-3 http://sldn.softlayer.com/article/Object-Filters http://sldn.softlayer.com/article/Python http://sldn.softlayer.com/article/Object-Masks License: http://sldn.softlayer.com/article/License Author: SoftLayer Technologies, Inc. <sldn@softlayer.com> """ import SoftLayer import json # Values "AMS01", "AMS03", "CHE01", "DAL05", "DAL06" "FRA02", "HKG02", "LON02", etc. location = "AMS01" # Values "20", "40", "80", "100", etc. storageSize = "40" # Values between "100" and "6000" by intervals of 100. iops = "100" # Values "Hyper-V", "Linux", "VMWare", "Windows 2008+", "Windows GPT", "Windows 2003", "Xen" os = "Linux" PACKAGE_ID = 222 client = SoftLayer.Client() productOrderService = client['SoftLayer_Product_Order'] packageService = client['SoftLayer_Product_Package'] locationService = client['SoftLayer_Location'] osService = client['SoftLayer_Network_Storage_Iscsi_OS_Type'] objectFilterDatacenter = {"name": {"operation": location.lower()}} objectFilterStorageNfs = {"items": {"categories": {"categoryCode": {"operation": "performance_storage_iscsi"}}}} objectFilterOsType = {"name": {"operation": os}} try: # Getting the datacenter. datacenter = locationService.getDatacenters(filter=objectFilterDatacenter) # Getting the performance storage NFS prices. itemsStorageNfs = packageService.getItems(id=PACKAGE_ID, filter=objectFilterStorageNfs) # Getting the storage space prices objectFilter = { "itemPrices": { "item": { "capacity": { "operation": storageSize } }, "categories": { "categoryCode": { "operation": "performance_storage_space" } }, "locationGroupId": { "operation": "is null" } } } pricesStorageSpace = packageService.getItemPrices(id=PACKAGE_ID, filter=objectFilter) # If the prices list is empty that means that the storage space value is invalid. if len(pricesStorageSpace) == 0: raise ValueError('The storage space value: ' + storageSize + ' GB, is not valid.') # Getting the IOPS prices objectFilter = { "itemPrices": { "item": { "capacity": { "operation": iops } }, "attributes": { "value": { "operation": storageSize } }, "categories": { "categoryCode": { "operation": "performance_storage_iops" } }, "locationGroupId": { "operation": "is null" } } } pricesIops = packageService.getItemPrices(id=PACKAGE_ID, filter=objectFilter) # If the prices list is empty that means that the IOPS value is invalid for the configured storage space. if len(pricesIops) == 0: raise ValueError('The IOPS value: ' + iops + ', is not valid for the storage space: ' + storageSize + ' GB.') # Getting the OS. os = osService.getAllObjects(filter=objectFilterOsType) # Building the order template. orderData = { "complexType": "SoftLayer_Container_Product_Order_Network_PerformanceStorage_Iscsi", "packageId": PACKAGE_ID, "location": datacenter[0]['id'], "quantity": 1, "prices": [ { "id": itemsStorageNfs[0]['prices'][0]['id'] }, { "id": pricesStorageSpace[0]['id'] }, { "id": pricesIops[0]['id'] } ], "osFormatType": os[0] } # verifyOrder() will check your order for errors. Replace this with a call to # placeOrder() when you're ready to order. Both calls return a receipt object # that you can use for your records. response = productOrderService.verifyOrder(orderData) print(json.dumps(response, sort_keys=True, indent=2, separators=(',', ': '))) except SoftLayer.SoftLayerAPIError as e: print("Unable to place the order. faultCode=%s, faultString=%s" % (e.faultCode, e.faultString)) j开始,不是吗?另外,我认为你让i+1数到了很远。它应该停在i。我不确定你为什么要检查bagOfWords.size() - 2。这会抛弃有效的群体。

distance < 4

答案 1 :(得分:1)

@bradimus的答案是完全正确的。我只想展示另一种方法。你注意到了,你的方法非常相似吗?那么,为什么不尝试将它合并为一种通用方法呢?如下所示:

public List<String> anygram(List<String> bagOfWords, int gramCount){

     List<String> result = new ArrayList<String>();

     for(int i=0;i<=bagOfWords.size()-gramCount; i++){
         for(int j=i; j+gramCount<=bagOfWords.size(); j++){
            StringBuilder builder = new StringBuilder();
            builder.append(bagOfWords.get(i));
            int k = j+1;
            for(; k<j+gramCount; k++){
                builder.append(" ");
                builder.append(bagOfWords.get(k));
            }
            builder.append(", ").append(k-i-1);
            result.add(builder.toString());
        }
    }

    return result;
}

我的答案不是评分。我刚刚对这项任务感兴趣,并找到了解决方案。