是否有用于将Java转换为Java或scala中的Avro文件的库。
我试图谷歌,但无法找到任何库。
答案 0 :(得分:0)
通过Google搜索我发现了这篇文章:https://dzone.com/articles/convert-csv-data-avro-data
引用:
要使用Hive将csv数据转换为Avro数据,我们需要按照以下步骤操作:
示例:使用csv(student_id,subject_id,年级)
--1. Create a Hive table stored as textfile
USE test;
CREATE TABLE csv_table (
student_id INT,
subject_id INT,
marks INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
--2. Load csv_table with student.csv data
LOAD DATA LOCAL INPATH "/path/to/student.csv" OVERWRITE INTO TABLE test.csv_table;
--3. Create another Hive table using AvroSerDe
CREATE TABLE avro_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.rishav.avro",
"name": "student_marks",
"type": "record",
"fields": [ { "name":"student_id","type":"int"}, { "name":"subject_id","type":"int"}, { "name":"marks","type":"int"}]
}');
--4. Load avro_table with data from csv_table
INSERT OVERWRITE TABLE avro_table SELECT student_id, subject_id, marks FROM csv_table;
答案 1 :(得分:0)
您可以通过以下方式轻松完成:
答案 2 :(得分:0)
你可以尝试这种方式(Spark 1.6)。
people.csv
Michael, 29
Andy, 30
Justin, 19
Pyspark
file = sc.textFile("people.csv")
df = file.map(lambda line: line.split(',')).toDF(['name','age'])
>>> df.show()
+-------+---+
| name|age|
+-------+---+
|Michael| 29|
| Andy| 30|
| Justin| 19|
+-------+---+
df.write.format("com.databricks.spark.avro").save("peopleavro")
Peopleavro
{u'age': u' 29', u'name': u'Michael'}
{u'age': u' 30', u'name': u'Andy'}
{u'age': u' 19', u'name': u'Justin'}
您是否需要维护数据类型,然后创建架构并将其传递。
schema = StructType([StructField("name",StringType(),True),StructField("age",IntegerType(),True)])
df = file.map(lambda line: line.split(',')).toDF(schema)
>>> df.printSchema()
root
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
现在你的avro已经
了{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name" : "name",
"type" : [ "string", "null" ]
}, {
"name" : "age",
"type" : [ "int", "null" ]
} ]
}
答案 3 :(得分:0)
如果是临时用途,则可以使用spark或spark-shell(带有选项:--packages org.apache.spark:spark-avro ...)。
示例代码示例:
val df = spark.read.csv("example.csv")
df.write.format("com.databricks.spark.avro").save("example.avro")