仅从输入url字符串中删除域名

时间:2017-07-13 15:13:14

标签: bash sed grep cut

已经进行了一些搜索,但似乎无法找到一种优雅的方式来做到这一点。我希望能够搜索下面的列表,并且最终只得到包含域名的纯文本输出文件,没有http://或/

之后的任何内容

这样的列表:

http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp

我希望最终得到这样的纯文本输出文件。

7wind.ru
aldersgatencsc.org
amunow.org

6 个答案:

答案 0 :(得分:3)

假设:

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath 'com.android.tools.build:gradle:2.3.3'
    }
}

apply plugin: 'android'

dependencies {
    compile fileTree(dir: 'libs', include: '*.jar')
    compile 'com.android.support:support-v4:23.1.0'
    compile 'com.android.support:appcompat-v7:24.1.1'
    compile 'com.android.support:multidex:1.0.0'
}

android {
    compileSdkVersion 25
    buildToolsVersion '25.0.1'
    buildToolsVersion '25.0.1'
    useLibrary 'org.apache.http.legacy'

    defaultConfig {
        minSdkVersion 15
        targetSdkVersion 23
        multiDexEnabled true
    }

    sourceSets {
        main {
            manifest.srcFile 'AndroidManifest.xml'
            java.srcDirs = ['src']
            resources.srcDirs = ['src']
            aidl.srcDirs = ['src']
            renderscript.srcDirs = ['src']
            res.srcDirs = ['res']
            assets.srcDirs = ['assets']
        }

        // Move the tests to tests/java, tests/res, etc...
        instrumentTest.setRoot('tests')

        // Move the build types to build-types/<type>
        // For instance, build-types/debug/java, build-types/debug/AndroidManifest.xml, ...
        // This moves them out of them default location under src/<type>/... which would
        // conflict with src/ being used by the main source set.
        // Adding new build types or product flavors should be accompanied
        // by a similar customization.
        debug.setRoot('build-types/debug')
        release.setRoot('build-types/release')
    }
}

您可以使用$ echo "$txt" http://7wind.ru/file/Behind+the+dune/ http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp

cut

或者,如果您的内容位于文件中:

$ echo "$txt" | cut -d'/' -f3
7wind.ru
aldersgatencsc.org
amunow.org

然后将其重定向到您想要的文件:

$ cut -d'/' -f3 file
7wind.ru
aldersgatencsc.org
amunow.org

答案 1 :(得分:1)

awk -F \/ '{ print $3 }' outputfile > newfile

打印由/

分隔的第3个字段

答案 2 :(得分:1)

$ sed -r 's#.*//([^/]*)/.*#\1#' Input_file
7wind.ru
aldersgatencsc.org
amunow.org

答案 3 :(得分:0)

尝试关注问题。

解决方案1:

awk '{sub(/.*\/\//,"");sub(/\/.*/,"");print}'   Input_file

解决方案第二:

awk '{match($0,/\/.[^/]*/);print substr($0,RSTART+2,RLENGTH-2)}'   Input_file

答案 4 :(得分:0)

首先剥离协议和://,然后删除下一个斜杠后的任何内容。

sed "s|.*://||; s|/.*||" url-list.txt

添加-i直接更改文件。

答案 5 :(得分:0)

试试这个正则表达式

QByteArray array=first.toLatin1() + second.toLatin1();
array.replace("0x","");
array= QByteArray::fromHex(array);

第一场比赛,第3组 但它也可以验证无效的网址!小心