如何覆盖现有数据流管道选项的默认值

时间:2016-11-04 16:48:35

标签: google-cloud-dataflow

我想覆盖现有数据流管道选项的默认值。 例如,我试过这样的

for $category in distinct-values($fruits/fruit/category)
let $items := $fruits/fruit[category = $category]/name
return 
  <summary>
    <category>{ $category }</category>
    <item list-names="{ string-join($items, ', ') }"/>  
  </summary>

但这不起作用。 有没有办法覆盖现有选项的默认值?

1 个答案:

答案 0 :(得分:1)

据我所知,这不可能以您描述的形式出现。你可以通过一个&#34;中间体&#34;来完成类似这样的事情。自定义参数与您的自定义默认值:

# define a dataframe
rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])
df = sqlContext.createDataFrame(rdd, ["id", "score"])

# define a list of scores
l = [10,18,20]

# filter out records by scores by list l
records = df.filter(~df.score.isin(l))
# expected: (0,1), (0,1), (0,2), (1,2)

# include only records with these scores in list l
df.where(df.score.isin(l))
# expected: (1,10), (1,20), (3,18), (3,18), (3,18)