输入:
IDS:
1111,2222,3333,4444
雇员:
{"name":"abc","id":"1111"} {"name":"xyz","id":"10"}
{"name":"z","id":"100"} {"name":"m","id":"99"}
{"name":"pqr","id":"3333"}
我想过滤ID在给定列表中存在的员工。
预期输出:
{"name":"xyz","id":"10"} {"name":"z","id":"100"}
{"name":"m","id":"99"}
现有代码:
idList = LOAD 'pathToFile' USING PigStorage(',') AS (id:chararray);
empl = LOAD 'pathToFile' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (data:map[]);
output = FILTER empl BY data#'id' in (idList);
-- not working, states: A column needs to be projected from a relation for it to be used as a scalar
output = FILTER empl BY data#'id' in (idList#id);
-- not working, states: mismatched input 'id' expecting set null
答案 0 :(得分:0)
JsonLoad()
在pig > 0.10
中是原生的,您可以指定架构:
empl = LOAD 'pathToFile' USING JsonLoader('name:chararray, id:chararray');
DUMP empl;
(abc,1111)
(xyz,10)
(z,100)
(m,99)
(pqr,3333)
您正在加载idList
作为chararray
类型的一列表,但您需要一个列表。
将其作为一个列表加载(意味着修改您的文件,因此每行只有一个记录):
idList = LOAD 'pathToFile' USING PigStorage(',') AS (id:chararray);
DUMP idList;
(1111)
(2222)
(3333)
(4444)
或作为单行文件,我们将更改分隔符,使其不会拆分为列(否则将导致仅加载第一列):
idList = LOAD 'pathToFile' USING PigStorage(' ') AS (id:chararray);
idList = FOREACH idList GENERATE FLATTEN(TOKENIZE(id, '[,]')) AS id;
DUMP idList;
(1111)
(2222)
(3333)
(4444)
现在我们可以LEFT JOIN
查看id
中不存在哪个idList
,然后FILTER
只保留output
。 res = JOIN empl BY id LEFT, idList BY id;
res = FILTER res BY idList::id IS NULL;
DUMP res;
(xyz,10,)
(m,99,)
(z,100,)
是保留关键字,您不应该使用它:
## app.R ##
library(shinydashboard)
ui <- dashboardPage(
dashboardHeader(title = "Basic dashboard"),
dashboardSidebar(
width = 0
),
dashboardBody(
# Boxes need to be put in a row (or column)
fluidRow(
box(title = "Activity Frequency", status = "primary",height = "520",
solidHeader = T,
plotOutput("plot1")),
box(title = "Activity Frequency", status = "primary",height = "520",
solidHeader = T,
sliderInput("slider", "Number of observations:", 1, 100, 50))
)
)
)
server <- function(input, output) {
set.seed(122)
histdata <- rnorm(500)
output$plot1 <- renderPlot({
data <- histdata[seq_len(input$slider)]
hist(data)
})
}
shinyApp(ui, server)