我正在处理bash中的文件输出,需要按键对值进行分组。
例如,我有
13,47099
13,54024
13,1
13,39956
13,0
17,126223
17,52782
17,4
17,62617
17,0
23,1022724
23,79958
23,80590
23,230
23,1
23,118224
23,0
23,1049
42,72470
42,80185
42,2
42,89199
42,0
54,70344
54,72824
54,1
54,62969
54,1
在文件中,并将特定键中的所有值分组为单行,如
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
我的输入文件中有大约10000个条目。如何在shell中转换此数据?
答案 0 :(得分:4)
awk
救援!
假设密钥是连续的......
$ awk -F, 'p!=$1 {if(a) print a; a=p=$1}
{a=a FS $2}
END {print a}' file
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
答案 1 :(得分:0)
对于awk
初学者来说,这里是@karakfa的代码的细目分类。我是根据玩具数据集file
编写的:
1,X
1,Y
3,Z
p!=$1
:检查模式p!=$1
是否为true
p
是否等于file
当前(第一行)的第一个字段(在这种情况下为1
)p
在这一点上未定义,因此它不能等于1
,因此p!=$1
是true
,我们继续执行这行代码if(a) print a
:检查变量a
是否存在,并打印变量a
。
a
,因此print a
命令未执行a=p=$1
:将变量a
和p
设置为等于当前(第一行)行的第一个字段的值(在这种情况下为1
)a=a FS $2
:将变量a
设置为等于a
并与由行分隔符(1,X
这种情况)END
:由于我们尚未到达file
的结尾,因此我们跳过了这一行代码的其余部分移动到file
的下一行(第二行),然后在该行上重新启动awk
代码
p!=$1
:检查模式p!=$1
是否正确
p
是1
,并且当前(第二行)的第一字段是1
,p!=$1
是false
,因此我们跳过了其余的这行代码a=a FS $2
:将a
设置为等于a
的值和由字段分隔符(1,X,Y
分隔的当前(第二)行的第二个字段的值在这种情况下)END
:由于我们尚未到达file
的结尾,因此我们跳过了这一行代码的其余部分移至file
的下一行(第三行)并重新启动awk
代码
p!=$1
:检查模式p!=$1
是否为true
p
是1
,第三行的$1
是3
,p!=$1
是true
,因此我们继续代码if(a) print a
:检查变量a
是否存在,并打印a
(如果存在)
a
是1,X,Y
,所以1,X,Y
被打印到输出中a=p=$1
:将变量a
和p
设置为等于当前(第三行)行的第一个字段的值(在这种情况下为3
)a=a FS $2
:将变量a
设置为等于a
并与由字段分隔符(3,Z
这种情况)END {print a}
:由于我们已经到了file
的结尾,因此请执行以下代码
print a
:打印最后一组a
(在这种情况下为3,Z
)结果输出是
1,X,Y
3,Z
如果此说明中有任何错误,请告诉我。
答案 2 :(得分:0)
微调@karakfa的答案。如果希望键和值之间的分隔符不同于值之间的分隔符,则可以使用以下代码:
SQL> CREATE OR REPLACE directory ext_data as 'D:\test'; -- execute as sysdba
Directory created.
SQL> ho type l_costs.sql
----------------------------------------------------------------------------------------
--------file nanme l_costs.sql----------------------------------------------------------
--------Description:stand alone script to load costs table-------------------------------
--------Date:01/22/2020-----------------------------------------------------------------
SET FEEDBACK 1
SET NUMWIDTH 10
SET LINESIZE 80
SET TRIMSPOOL ON
SET TAB OFF
--SET PAGESIZE 100
SET VERIFY OFF
SET CONCAT '.'
SET PAGESIZE 0
--
-- COSTS
--
PROMPT creating costs100 dummy table for testing
CREATE TABLE sh.costs100
AS
SELECT * FROM sh.costs
WHERE 1>0;
PROMPT
PROMPT DROPPING TABLE sales_transactions_ext100
-- you can omit this step once desc or select works on the external table
DROP TABLE sh.sales_transactions_ext100;
PROMPT CREATING TABLE sales_transactions_ext100
PROMPT
/*
CREATE TABLE sales_transactions_ext
( PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID DATE,
CHANNEL_ID NUMBER,
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER,
AMOUNT_SOLD NUMBER(10,2),
UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_file_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII TERRITORY AMERICA
BADFILE 'C:\sql\db-sample-schemas-Windows\sales_history\ext_lv3.bad'
LOGFILE 'C:\sql\db-sample-schemas-Windows\sales_history\ext_lv3.log'
FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '^' LDRTRIM
( PROD_ID ,
CUST_ID ,
TIME_ID DATE(10) "YYYY-MM-DD",
CHANNEL_ID ,
PROMO_ID ,
QUANTITY_SOLD ,
AMOUNT_SOLD ,
UNIT_COST ,
UNIT_PRICE
)
)
LOCATION
('sale1v3.dat')
*/
CREATE TABLE sh.sales_transactions_ext100
( PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID DATE,
CHANNEL_ID NUMBER,
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER,
AMOUNT_SOLD NUMBER(10,2),
UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY EXT_DATA
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII TERRITORY AMERICA
BADFILE 'ext_lv3.bad'
LOGFILE 'ext_lv3.log'
FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '^' LDRTRIM
( PROD_ID ,
CUST_ID ,
TIME_ID DATE(10) "YYYY-MM-DD",
CHANNEL_ID ,
PROMO_ID ,
QUANTITY_SOLD ,
AMOUNT_SOLD ,
UNIT_COST ,
UNIT_PRICE
)
)
LOCATION ('sale1v3.dat')
)
REJECT LIMIT 100;
PROMPT Verify external table created without any error
PROMPT
DESC sh.sales_transactions_ext100
PROMPT
PROMPT count the rows in sales_transactions_ext100
PROMPT
select count(*) from sh.sales_transactions_ext100;
PROMPT
PROMPT loading COSTS using external table
PROMPT
INSERT /*+ append */ INTO sh.costs100
( prod_id,
time_id,
channel_id,
promo_id,
unit_cost,
unit_price )
SELECT
prod_id,
time_id,
channel_id,
promo_id,
AVG(unit_cost),
AVG(amount_sold/quantity_sold)
FROM
sh.sales_transactions_ext100
GROUP BY
prod_id,
time_id,
channel_id,
promo_id;
commit;
PROMPT
PROMPT verify costs100 table loaded
PROMPT
PROMPT Total rows in sh.costs100
select count(*) from sh.costs100;
PROMPT Truncate table costs100 for next run
PROMPT
TRUNCATE TABLE sh.costs100;
PROMPT
PROMPT Verify table is empty
PROMPT
select count(*) from sh.costs100;
.
SQL> @l_costs
creating costs100 dummy table for testing
CREATE TABLE sh.costs100
*
ERROR at line 1:
ORA-00955: name is already used by an existing object
DROPPING TABLE sales_transactions_ext100
Table dropped.
CREATING TABLE sales_transactions_ext
Table created.
Verify external table created without any error
Name Null? Type
------------------------------- -------- ----------------------------
1 PROD_ID NUMBER
2 CUST_ID NUMBER
3 TIME_ID DATE
4 CHANNEL_ID NUMBER
5 PROMO_ID NUMBER
6 QUANTITY_SOLD NUMBER
7 AMOUNT_SOLD NUMBER(10,2)
8 UNIT_COST NUMBER(10,2)
9 UNIT_PRICE NUMBER(10,2)
count the rows in sales_transactions_ext100
916039
1 row selected.
loading COSTS using external table
82112 rows created.
Commit complete.
verify costs100 table loaded
Total rows in sh.costs100
82112
1 row selected.
Truncate table costs100 for next run
Table truncated.
Verify table is empty
0
1 row selected.
SQL>