我有一个包含3列的CSV文件:func tableView(tableView: UITableView, numberOfRowsInSection section: Int) -> Int{
if section == 0{
if let temp = googleDicCount{
return googleDicCount! //this line gives me crash
}else{
return 0
}
}else if section == 1{
if let temp = foursquareDicCount{
return foursquareDicCount!
}else{
return 0
}
}
return 1
}
,tweetid
和tweet
。但是,在Userid
列中有逗号分隔值。
即。 1行数据:
tweet
我想单独提取所有3个字段,但`396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
给出了一个错误:
REGEX_EXTRACT
错误是:
a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\"(.*)',1);
答案 0 :(得分:2)
在共享的用例中,使用PigStrorage(',')读取数据将导致缺少savava143(最后一个字段值)
A = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(',') AS (f1,f2,f3);
DUMP A;
输出:A:观察到缺少最后一个字段值。
(396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.")
对于共享的用例,要从CSV文件中提取所有值,其字段值为','我们可以使用CSVExcelStorage或CSVLoader。
方法1:使用CSVExcelStorage
参考:http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
输入:a.csv
396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143
猪脚本:
REGISTER piggybank.jar;
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (f1,f2,f3);
DUMP A;
输出:A
(396124437168537600,I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.,savava143)
方法2:使用CSVLoader
参考:http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html
下面的脚本使用了CSVLoader(),DUMP A会产生前面看到的相同输出。
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);
答案 1 :(得分:0)
错误是您不希望FILTER
基于正则表达式,而GENERATE
新字段基于正则表达式。要进行过滤,您需要知道是否必须过滤行,因此需要布尔要求。
因此,您必须使用:
b = FOREACH a GENERATE REGEX_EXTRACT(FIELD, REGEX, HOW_MANY_GROUPS_TO_RETURN);
然而,正如@Murali Rao所说,你的价值不仅仅是昏迷,而是CSV(想想你将如何处理推文中的昏迷:它不是字段分隔符,只是一些内容)。