Question

我有一个包含拆分http请求的数组。我将它们过滤为两种可能性之一：

|[, courses, 27381...|
|[, courses, 27547...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, courses, 33287...|
|[, courses, 24024...|

在两种阵列类型中，来自＆＃39; course＆＃39;向前是相同的数据和结构。

我想使用case语句获取数组的切片，如果数组的第一个元素是＆＃39; api＆＃39;，则取元素3 - ＆gt;数组的结尾。我尝试使用Python切片语法[3:]和普通PostgreSQL语法[3, n]，其中n是数组的长度。如果它不是＆＃39; api＆＃39;，那么只需获取给定的值。

我理想的最终结果将是一个数组，其中每一行共享相同的结构，第一个索引中的课程可以更容易地从该点开始解析。

Answer 1

定义一个require 'test_helper' describe "Test", :type => :feature, :js => true do it 'Add Widget' do visit ('/') fill_in('user_email', :with => 'test@test.com') fill_in('user_password', :with => 'p@ssword') click_button('Log in') click_link('Tools') expect(page).to have_content 'Tools' click_link('Stuff') expect(page).to have_content 'Stuff' click_link('Create New Thingy') expect(page).to have_content 'New Thingy' fill_in('thingy_name', :with => 'Name for a Thing!') click_button('Create Thing!') end end非常简单，你制作了very similar question previously所以我不会发布确切的答案让你思考和学习（为了你自己的利益）。

UDF

Answer 2

假定数据框中的列称为http_col，并且数组中的第一项是空字符串，则可能的解决方案是：

df.selectExpr(
  """if(array_contains(http_col, 'api'),
        slice(http_col, 4, size(http_col) - 3),
        http_col) as cleaned_http_col
  """
)

如果您使用Spark> = 2.4.0，则另一个选择可能是：

df.selectExpr(
  "array_remove(array_remove(http_col, 'api'), 'v1') as cleaned_http_col"
)

如何在Spark SQL（Dataframes）中拉出数组的切片？

2 个答案: