我正在读取USA_Housing.csv文件,这些文件是 (平均面积收入,平均面积入室年龄,平均面积房间数,平均面积卧室数,面积人口,价格,地址) 除地址外,所有列均为数值 以此方式读取数据时:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().getOrCreate()
val data = spark.read.option("header","true").option("inferSchema","true").format("csv").load("USA_Housing.csv")
data.printSchema()
printSchema的输出是:
|-- Avg Area Income: string (nullable = true)
|-- Avg Area House Age: string (nullable = true)
|-- Avg Area Number of Rooms: double (nullable = true)
|-- Avg Area Number of Bedrooms: double (nullable = true)
|-- Area Population: double (nullable = true)
|-- Price: double (nullable = true)
|-- Address: string (nullable = true)
平均地区收入和地区房屋年龄都是字符串,但它们实际上是csv文件中的 double 。
当我通过ATOM打开数据时,显示为:
Avg Area Income,Avg Area House Age,Avg Area Number of Rooms,Avg Area Number of Bedrooms,Area Population,Price,Address
79545.45857431678,5.682861321615587,7.009188142792237,4.09,23086.800502686456,1059033.5578701235,"208 Michael Ferry Apt. 674
Laurabury, NE 37010-5101"
79248.64245482568,6.0028998082752425,6.730821019094919,3.09,40173.07217364482,1505890.91484695,"188 Johnson Views Suite 079
Lake Kathleen, CA 48958"
答案 0 :(得分:2)
将multiLine设置为true应该可以。
private void button1_Click(object sender, EventArgs e) // result bottom
{
double box_In_Top_Left = Convert.ToDouble(textBox1.Text); // Right UPPER BOX
double box_In_Down_Left = Convert.ToDouble(textBox2.Text); // Venstra Nederst string
double box_In_Top_Right = Convert.ToDouble(textBox3.Text); // Højre OP string
double box_In_Down_Right = Convert.ToDouble(textBox4.Text); // Højre Nederst String
double whole = box_In_Down_Right * box_In_Down_Left; // Whole (Bottom Part of A fraction
string whole_String = Convert.ToString(whole); // Converts the Whole to a string
textBox7.Text = whole_String; // Shows the Answer in the box in the bottom right
double Calculation1 = box_In_Top_Left * box_In_Down_Right; // Calculates the top lefts box result
double Calculation2 = box_In_Top_Right * box_In_Down_Left; // Calculates the top right box Result
double part = Calculation2 + Calculation1; // Calculates answer for the top box
string part_String = Convert.ToString(part);
if (part >= whole) // if the part is bigger then the whole
{
double Amount_Of_times_greater = part / whole;
string string_Amount_Of_times_greater = Convert.ToString(Amount_Of_times_greater);
double Ekstra_greatnes = part / Amount_Of_times_greater;
textBox6.Text = string_Amount_Of_times_greater;
double Part_Whole = (part / Amount_Of_times_greater);
if (Ekstra_greatnes == whole)
{
Part_Whole = Part_Whole - whole;
string string_Part_Whole = Convert.ToString(Part_Whole);
textBox8.Text = string_Part_Whole;
}
else
{
string string_Part_Whole = Convert.ToString(Part_Whole);
textBox8.Text = string_Part_Whole;
}
}
else // For if the the part is not bigger then the whole
{
textBox8.Text = part_String; // Displayes part in the box in the right corner
}
}
答案 1 :(得分:0)
csv(来自kaggle)格式不正确,地址列中有换行符。因此,第一列实际上被解析为:
+------------------+
| _c0|
+------------------+
| 79545.45857431678|
| Laurabury|
| 79248.64245482568|
| Lake Kathleen|
|61287.067178656784|
| Danieltown|
| 63345.24004622798|
| FPO AP 44820"|
|59982.197225708034|
| FPO AE 09386"|
因此spark将其作为字符串