用.duplicated()迭代地对pandas块进行子集化给了我空数组

时间:2017-03-09 03:23:06

标签: python pandas bigdata

我正在用大熊猫的大块csv阅读。我将每个块子集化以查看是否存在重复的时间戳:

for c in chunks:
    dups= c.duplicated(subset='Timestamp')
    dups= dups[dups==True]
    print(dups)

当我打印dups时,我得到以下内容:

255    True
dtype: bool
Series([], dtype: bool)

2295    True
2687    True
dtype: bool
Series([], dtype: bool)

我理解为什么我得到条件为真的索引,但为什么空的Series对象?

1 个答案:

答案 0 :(得分:0)

在您的循环中,如果import java.util.Scanner; public class avg { public static void main (String args[]) { Scanner in = new Scanner(System.in); System.out.println("Enter some numbers(Ctrl-d to quit):"); double[] myArray = new double[10]; int howMany = 0; double sum=0 ; double avg=0; while (in.hasNextDouble()) // Ctrl-D to terminate { double userVal = in.nextDouble(); myArray[howMany++] = userVal; if ( howMany >= myArray.length ) { //myArray.length = myArray.length * 2; double[] tempArray = new double[2 * myArray.length]; for (int i = 0; i < howMany; i++) tempArray[i] = myArray[i]; myArray = tempArray; } } for (int i = howMany - 1; i >= 0; i--) { sum += myArray[i]; } avg = sum/howMany; System.out.print("Average: "+ avg); System.out.println(myArray<avg);// to print numbers that the user entered only which are strictly less than the avg. This did not work// } } 全部为dups= dups[dups==True],则行Series会返回空的dups。如果您不想在空的时候打印它,可以勾选False

len(dups) > 0