使用Java按年龄对用户进行分组

时间:2013-11-26 18:39:22

标签: java

我有一个1,000,000个不同年龄的用户列表,我想用Java进行搜索,根据他们的年龄范围输出该组中的人数。 例如:

Age Group                 Age Range
1                         6 years old or younger
2                         7 to 18 years old
3                         19 to 26 years old
4                         27 to 49 years old
5                         50 to 64 years old
6                         65 to 79 years old
7                         80 years old or older

如果我进入特定年龄组,我希望我的输出显示属于年龄组的人数。那就是:

If I enter 1

输出应为:

**** users found (total number of users that falls within the 
age range 6 years old or younger)

任何类型的数据结构都可以。

这是我到目前为止所做的:

 /**
 A template used to read data lines into java.util.ArrayList data structure.
 Input file: pjData.csv
 Input file must be saved under the same directory/folder as the program.
 Each line contains 5 fields, separated by commas. For example,
 959695171, 64, AZ, M, 1
 355480298, 101, TN, F, 1
 **/
 import java.io.*;
 import java.util.*;
 public class pj3Template2
 {
  public static void main(String args[]) 
 {
String line;
String id, s, g;
Integer a, sa;
StringTokenizer st;
HealthDS2 records = new HealthDS2(); 
try   {
      FileReader f = new FileReader("pjData.csv");
      BufferedReader in = new BufferedReader(f);
      while ((line = in.readLine()) != null)
         {
         st = new StringTokenizer(line, ",");
         id = st.nextToken(",").trim();
         a = Integer.valueOf(st.nextToken(",").trim());
         s = st.nextToken(",").trim().toUpperCase();
         g = st.nextToken(",").trim().toUpperCase();
         sa = Integer.valueOf(st.nextToken().trim());
         records.add(new HealthRec2(id, a, s, g, sa));
         } // loop until the end of file
      in.close(); 
      f.close();
      }
      catch (Exception e) {  e.printStackTrace(); };
System.out.println(records.getSize() + " records processed.");

 // Search by age
System.out.print("Enter 1-character age abbreviation to search: ");
String ui;
Scanner input = new Scanner(System.in);
ui = input.next().trim();
System.out.println("Searching all records in: " + ui);

ArrayList <HealthRec2> al = records.searchByAge(Integer.valueOf(ui.trim()));
System.out.println(al.size() + " records found.");    

     }
 }

 /**
 Data class Sample records:
 5501986, 31, WV, F, 1
 1539057187, 5, UT, M, 2
 **/
 class HealthRec2
 {
    String ID;
    Integer age;
    String state;
    String gender;
    int status;
    public HealthRec2() { }
    public HealthRec2(String i, Integer a, String s, String g, int sa)
       { ID = i; age = a;  state = s; gender = g; status = sa; }
 // Reader methods
 public String getID()     { return ID; }
 public Integer getAge()   { return age; }
 public String getState()  { return state; }
 public String getGender() { return gender; }
 public int getStatus()    { return status; }
 // Writer methods
 public void setAge(Integer a)   { age = a; }
 public void setState(String s)  { state = s; }
 public void setGender(String g) { gender = g; }
 public void setStatus(int sa)   { status = sa; }

 public String toString()
 { return ID + "  " + age + "  " + state + "   " + gender + "  " + status; }
 } // HealthRec


 // Data structure used to implement the requirement
 // This implementation uses java.util.ArrayList
 class HealthDS2
 {
 ArrayList <HealthRec2> rec;
 public HealthDS2() 
  { rec = new ArrayList <HealthRec2>(); }
 public HealthDS2(HealthRec2 r) 
 { 
 rec = new ArrayList <HealthRec2>();
 rec.add(r); 
 }
 public int getSize() { return rec.size(); }
 public void add(HealthRec2 r) { rec.add(r); }

// Search by age
// No data validation is needed -- assuming the 1-character age is valid
// Returns an ArrayList of records
public ArrayList <HealthRec2> searchByAge(Integer a)
{
ArrayList <HealthRec2> temp = new ArrayList <HealthRec2>();
  for (int k=0; k < rec.size(); ++k)
  {
  if (rec.get(k).getAge().equals(a))
     temp.add(rec.get(k));  
  }
  return temp;
  } // searchByAge
  } // HealthDS

我的目标是根据statestatusgenderage群组进行搜索。我已经为其他人做了这个,但我只是对年龄组有一点问题,因为它是分组的,而不仅仅是在数据文件中搜索特定的年龄。我尝试为每个组创建七个arraylists但我仍然在组之间切换时遇到一些问题。

4 个答案:

答案 0 :(得分:0)

此代码执行:

  1. 获取所选组的最小和最大年龄
  2. 遍历年龄并递增最小/最大
  3. 内任何年龄的计数器
  4. 打印出结果
  5. 对于非常大的数据集,您需要使用更好的数据结构,例如@kyticka提及。

    public static void main (String[] args) throws java.lang.Exception
        {
            int[] groupMin = new int[]{0, 10, 20};
            int[] groupMax = new int[]{10, 20, 9999};
    
            int[] ages = new int[]{ 1, 2, 3, 10, 12, 76, 56, 89 };
    
            int targetGroup = 1;
            int count = 0;
            for( int age : ages ){
                if( age >= groupMin[targetGroup] && age < groupMax[targetGroup] ){
                    count++;
                }
            }
    
            System.out.println("Group " + targetGroup + " range is " + 
                            groupMin[targetGroup] + " - " + groupMax[targetGroup]);
            System.out.println("Count: " + count);
        }
    

    您可以在此处播放:http://ideone.com/DAWGYX

答案 1 :(得分:0)

您可以使用某种方式初始化您的1000000用户甚至以下代码将为用户生成随机年龄:

import java.util.ArrayList;
import java.util.Random;
import java.util.Scanner;

public class UserListDemo {
    int age;
    class Users{
        int age=0;
        public Users(int a)
        {
            age=a;
        }
        public void setAge(int age)
        {
            this.age=age;
        }
        public int getAge()
        {
            return this.age;
        }
    }

    public static void main(String a[])
    {
        UserListDemo uld=new UserListDemo();
        ArrayList<Users> data=new ArrayList<Users>();
        uld.initializeUsers(data);
        System.out.println("Enter age group choice"); 
        System.out.println("Enter 1 for age group 1-6");
        System.out.println("Enter 2 for age group 7-18");
        System.out.println("Enter 3 for age group 19-26");
        System.out.println("Enter 4 for age group 27-49");
        System.out.println("Enter 5 for age group 50-64");
        System.out.println("Enter 6 for age group 65-79");
        System.out.println("Enter 7 for age group 80-Older");
        Scanner sc=new Scanner(System.in);
        String choice=sc.nextLine();
        int ch=Integer.valueOf(choice);
        long result=0;
        switch(ch)
        {
        case 1:
            for(Users us:data)
            {
                if(us.age<=6)
                    result++;
            }

        case 2:
            for(Users us:data)
            {
                if( us.age>=7 && us.age<=18 )
                    result++;
            }
        case 3:
            for(Users us:data)
            {
                if( us.age>=19 && us.age<=26 )
                    result++;
            }
        case 4:
            for(Users us:data)
            {
                if( us.age>=27 && us.age<=49 )
                    result++;
            }
        case 5:
            for(Users us:data)
            {
                if( us.age>=50 && us.age<=64 )
                    result++;
            }
        case 6:
            for(Users us:data)
            {
                if( us.age>=65 && us.age<=79 )
                    result++;
            }
        case 7:
            for(Users us:data)
            {
                if( us.age>=80)
                    result++;
            }


        }
        System.out.println("For the entered age group :"+ch+" ::"+result+" user has been found");


    }
    public void initializeUsers(ArrayList<Users> data)
    {
        Users us;
        Random rand=new Random();
        for(long l=0;l<1000000L;l++)
        {
            us=new Users(rand.nextInt(100));    
            data.add(us);
        }
    }
}

答案 2 :(得分:0)

有1M条记录的有效答案是使用几个Map作为索引,甚至是一个实际的数据库。但是,由于该练习明确提到了ArrayList,因此您可能仍在学习基础知识,因此我将坚持这些基础。

首先,您需要能够检索给定人员的组。您可以通过两种方式做到这一点。

  • 选项 A 是在初始化时将组添加为字段

    // within HealthRec2
    
    int group;                              // stores group number as an attribute
    
    private static final int[] ageGroups =  // age limits for each group
         new int[]{6, 18, 26, 49, 64, 79};
    
    private void updateGroup() { // <-- called from constructor and from setAge()
       int currentGroup = 0;
       for (int limit : ageGroups) {
           currentGroup ++;         // advance to next group
           if (age <= limit) break; // stop looking at limits once we reach one
       }
       group = currentGroup;
    }
    
    private int getGroup() { return group; } 
    
  • 选项 B 用于为每条记录即时计算,而不是将其存储为属性:

    // within HealthRec2
    
    private static final int[] ageGroups =  // age limits for each group
         new int[]{6, 18, 26, 49, 64, 79};
    
    public int getGroup() { 
       int currentGroup = 0;
       for (int limit : ageGroups) {
           currentGroup ++;         // advance to next group
           if (age <= limit) break; // stop looking at limits once we reach one
       }
       return currentGroup;
    }
    

无论使用哪种选择,您现在都可以使用非常相似的逻辑来查找给定年龄组中的人,因为您必须查找来自给定州或给定性别的记录。

选项A的前期成本更高,因为即使您不需要年龄组,您仍然必须计算它并将其存储在属性中,以防万一。如果您需要为同一条记录多次调用getGroup,则选项B会更昂贵-因为选项A的getGroup要快得多。

答案 3 :(得分:-1)

创意一:排序并使用二分查找http://en.wikipedia.org/wiki/Binary_search

理念二:使用区间树http://en.wikipedia.org/wiki/Interval_tree