已经进行了一些搜索,但似乎无法找到一种优雅的方式来做到这一点。我希望能够搜索下面的列表,并且最终只得到包含域名的纯文本输出文件,没有http://或/
之后的任何内容这样的列表:
http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp
我希望最终得到这样的纯文本输出文件。
7wind.ru
aldersgatencsc.org
amunow.org
答案 0 :(得分:3)
假设:
buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath 'com.android.tools.build:gradle:2.3.3'
}
}
apply plugin: 'android'
dependencies {
compile fileTree(dir: 'libs', include: '*.jar')
compile 'com.android.support:support-v4:23.1.0'
compile 'com.android.support:appcompat-v7:24.1.1'
compile 'com.android.support:multidex:1.0.0'
}
android {
compileSdkVersion 25
buildToolsVersion '25.0.1'
buildToolsVersion '25.0.1'
useLibrary 'org.apache.http.legacy'
defaultConfig {
minSdkVersion 15
targetSdkVersion 23
multiDexEnabled true
}
sourceSets {
main {
manifest.srcFile 'AndroidManifest.xml'
java.srcDirs = ['src']
resources.srcDirs = ['src']
aidl.srcDirs = ['src']
renderscript.srcDirs = ['src']
res.srcDirs = ['res']
assets.srcDirs = ['assets']
}
// Move the tests to tests/java, tests/res, etc...
instrumentTest.setRoot('tests')
// Move the build types to build-types/<type>
// For instance, build-types/debug/java, build-types/debug/AndroidManifest.xml, ...
// This moves them out of them default location under src/<type>/... which would
// conflict with src/ being used by the main source set.
// Adding new build types or product flavors should be accompanied
// by a similar customization.
debug.setRoot('build-types/debug')
release.setRoot('build-types/release')
}
}
您可以使用$ echo "$txt"
http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp
:
cut
或者,如果您的内容位于文件中:
$ echo "$txt" | cut -d'/' -f3
7wind.ru
aldersgatencsc.org
amunow.org
然后将其重定向到您想要的文件:
$ cut -d'/' -f3 file
7wind.ru
aldersgatencsc.org
amunow.org
答案 1 :(得分:1)
awk -F \/ '{ print $3 }' outputfile > newfile
打印由/
分隔的第3个字段答案 2 :(得分:1)
$ sed -r 's#.*//([^/]*)/.*#\1#' Input_file
7wind.ru
aldersgatencsc.org
amunow.org
答案 3 :(得分:0)
尝试关注问题。
解决方案1:
awk '{sub(/.*\/\//,"");sub(/\/.*/,"");print}' Input_file
解决方案第二:
awk '{match($0,/\/.[^/]*/);print substr($0,RSTART+2,RLENGTH-2)}' Input_file
答案 4 :(得分:0)
首先剥离协议和://
,然后删除下一个斜杠后的任何内容。
sed "s|.*://||; s|/.*||" url-list.txt
添加-i
直接更改文件。
答案 5 :(得分:0)
试试这个正则表达式
QByteArray array=first.toLatin1() + second.toLatin1();
array.replace("0x","");
array= QByteArray::fromHex(array);
第一场比赛,第3组 但它也可以验证无效的网址!小心