这个RDD中的空格来自哪里?

时间:2017-01-01 02:12:44

标签: apache-spark pyspark rdd

目的是转换驻留在文件中的整数:

import android.content.Context;
import android.support.annotation.Nullable;
import android.util.AttributeSet;


public class ScrollingTextView extends android.support.v7.widget.AppCompatTextView {
public ScrollingTextView(Context context, @Nullable AttributeSet attrs, int defStyleAttr) {
    super(context, attrs, defStyleAttr);
    init();
}

public ScrollingTextView(Context context, AttributeSet attrs) {
    super(context, attrs);
    init();
}

private void init() {
    //set scrolling
    setSelected(true);
    requestFocus();
}
}

分为三个数组,因此可以执行数学运算。

预期

1 2 3
4 5 6
7 8 9

实际

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

代码

[[u'1', u' ', u'2', u' ', u'3'], [u'4', u' ', u'5', u' ', u'6'], [u'7', u' ', u'8', u' ', u'9']]

2 个答案:

答案 0 :(得分:2)

问题似乎是数字是unicode格式而不是int。 您可以通过将它们转换为int来解决它(参见https://docs.python.org/2/library/functions.html#int

>>> pairs = txt.map(lambda x: x.split(' '))
>>> print pairs.collect()
[[u'1', u'2', u'3'], [u'4', u'5', u'6'], [u'7', u'8', u'9']]

>>> pairs2 = pairs.map(lambda x: [int(s) for s in x])
>>> print pairs2.collect()
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> 

答案 1 :(得分:-2)

pairs = txt.map(lambda x: x.split(' '))
// this return every concatenated character that separated by space ' ', which kind of similar to following function (lamda also aware of newline from file)
def AFunc(aString):
   returnArray = []
   tempString = ""
   foreach(char in aString)
      if char == ' ':
         if tempString != "":
            returnArray.append(tempString)
            tempString = ""
      else:
         tempString += char
   return returnArray


// ..
pairs = txt.map(lambda x: [s for s in x])
// this return every character in a string, which kind of similar to following function (lamda also aware of newline from file)
def BFunc(aString):
   returnArray = []
   foreach(char in aString):
     returnArray.append(char)
   return returnArray

http://www.python-course.eu/lambda.php