我有一个数据文件,我试图导入到redshift(postgress mpp数据库)。我试图用'|'导入postgres分隔符。但有些数据,有'|'在字符串数据本身中,例如:
73779087|"UCGr4c0_zShyHbctxJJrJ03w"|"ItsMattSpeaking | SuPra"
所以我尝试了这个sed命令:
sed -i -E "s/(.+|)(.+|)|/\1\2\\|/g" inputfile.txt >outputfile.txt
关于sed命令有什么问题的任何想法,要替换|在最后一个带有\ |的字符串中转义字符,以便Redshift不会将其视为分隔符?任何帮助表示赞赏。
答案 0 :(得分:1)
这可能适合你(GNU sed):
<link href="http://semantic-ui.com/dist/semantic.css" rel="stylesheet" />
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="http://semantic-ui.com/dist/semantic.js"></script>
<select class="ui fluid search dropdown" style="width:100%;" id="sel_date" name="sel_date">
<option value="0">Select an option</option>
</select>
这会删除双引号内的sed -r ':a;s/^([^"]*("[^"|]*"[^"]*)*"[^"|]*)\|/\1/g;ta' file
但是它不适合引用的引号,所以要小心!
答案 1 :(得分:0)
有些事情你没有使用SED,我说这是其中之一。尝试使用带有re库的python脚本或只是简单的字符串操作。
答案 2 :(得分:0)
我认为这个C ++代码可以满足您的需求。
// $ g++ -Wall -Wextra -std=c++11 main.cpp
#include <iostream>
int main(int, char*[]) {
bool str = false;
char c;
std::ios_base::sync_with_stdio(false);
std::cin.tie(nullptr);
while (std::cin.get(c)) {
if (c == '|') {
if (str) {
std::cout << '\\'; } }
else if (c == '"') {
// Toggle string parsing.
str = !str; }
else if (c == '\\') {
// Skip escaped chars.
std::cout << c;
std::cin.get(c); }
std::cout << c; }
return 0; }
此示例中的sed问题是您需要了解的不仅仅是基础知识,以便跟踪您所处的状态(字符串与否)。
答案 3 :(得分:0)
这是一个脚本,用于将具有管道分隔值的文件(如上所述)转换为遵循TSV文件的更简单约定的文件。它假定PHP解释器的可用性。如果脚本保存为psv2tsv并在Mac或Linux环境中可执行,则psv2tsv -h
应提供更多详细信息。
示例用法(使用<TAB>
表示输出中的TAB):
$ psv2tsv <<< $'73779087|"UCGr4c0_zShyHbctxJJrJ03w"|"ItsMattSpeaking | SuPra"'
73779087<TAB>UCGr4c0_zShyHbctxJJrJ03w<TAB>ItsMattSpeaking | SuPra<TAB>
$ psv2tsv <<< $'a|"b|c\t\d"|"e\n"'
a<TAB>b|c\t\d<TAB>e\n<TAB>
#!/usr/bin/env php
<?php
# Author: pkoppstein at gmail.com 12/2015
# Use at your own risk.
# Input - pipe-separated values along the lines of CSV.
# Translate embedded newline and tab characters.
function help() {
global $argv;
echo <<<EOT
Syntax: {$argv[0]} [filepath]
Convert a file or stream of records with pipe-separated values to the
TSV (tab-separated value) format. If no argument is specified, or if
filepath is specified as -, then input is taken from stdin.
The input is assumed to be like a CSV file but with pipe characters
(|) used instead of commas. The output follows the simpler
conventions of TSV files.
Note that each tab in the input is translated to "\\t", and each
embedded newline is translated to "\\n". Each translated record is
then written to stdout. See PHP's fgetcsv for further details.
EOT;
}
$file = ($argc > 1 ) ? $argv[1] : 'php://stdin';
if ( $file == "-h" or $file == "--help" ) {
help();
exit;
}
if ( $file == "-" ) $file = 'php://stdin';
$handle = @fopen($file, "r");
if ($handle) {
while (($data = fgetcsv($handle,0,"|")) !== FALSE) {
$num = count($data);
for ($c=0; $c < $num; $c++) {
# str_replace( mixed $search , mixed $replace , mixed $subject [, int &$count ] )
echo str_replace("\t", "\\t", str_replace("\n", "\\n", $data[$c])) . "\t";
}
echo "\n";
}
fclose($handle);
}
else {
echo "{$argv[0]}: unable to fopen $argv[1]\n";
exit(1);
}
?>