假设我在C盘中有一个文本文件test.txt。
On the face of things, we seem to be merely talking about text-based files, containing only
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128
characters that the American Standard Code for Information Interchange (ASCII)
allowed them. The focus is on mid-1980's textfiles and the world as it was then,
but even these files are sometime retooled 1960s and 1970s works, and offshoots
of this culture exist to this day.
我想将所有行拆分为单词,然后将其另存为新文件。在新文件中,每行只包含一个单词。
因此新文件将是:
On
the
face
of
things
we
seem
to
....
分隔符是一个空格,请跳过所有标点符号。
答案 0 :(得分:2)
你还没试过。下次我投票支持封闭式问题。 Powershell使用了99%的c#语法和“all”.Net类,所以如果你知道c#,那么在谷歌上使用5分钟并尝试一些命令,你将在PowerShell中走得更远。
#create array
$words = @()
#read file
$lines = [System.IO.File]::ReadAllLines("C:\Users\Frode\Desktop\in.txt")
#split words
foreach ($line in $lines) {
$words += $line.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
}
#save words
[System.IO.File]::WriteAllLines("C:\Users\Frode\Desktop\out.txt", $words)
在PowerShell中你也可以这样做:
Get-Content .\in.txt | ForEach-Object {
$_.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
} | Set-Content out.txt
答案 1 :(得分:1)
$Text = @'
On the face of things, we seem to be merely talking about
text-based files, containing only the letters of the English Alphabet
(and the occasional punctuation mark). On deeper inspection, of
course, this isn't quite the case. What this site offers is a glimpse
into the history of writers and artists bound by the 128 characters
that the American Standard Code for Information Interchange (ASCII)
allowed them. The focus is on mid-1980's textfiles and the world as it
was then, but even these files are sometime retooled 1960s and 1970s
works, and offshoots of this culture exist to this day.
'@
[regex]::split($Text, ‘\W+’)
答案 2 :(得分:0)
这是一个使用正则表达式的解决方案,它将:
\b
)代码:
$Text = @'
On the face of things, we seem to be merely talking about text-based files, containing only
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128
characters that the American Standard Code for Information Interchange (ASCII)
allowed them. The focus is on mid-1980's textfiles and the world as it was then,
but even these files are sometime retooled 1960s and 1970s works, and offshoots
of this culture exist to this day.
'@;
# Remove special characters
$Text = $Text -replace '\(|\)|''|\.|,','';
# Match words
$MatchList = ([Regex]'(?<word>\b\w+\b)').Matches($Text);
# Get just the text values of the matches
$WordList = $MatchList | % { $PSItem.Groups['word'].Value; };
# Examine the 'Count' of words
$WordList.Count
结果如下:
$WordList[0..9];
On
the
face
of
things
we
seem
to
be
merely
答案 3 :(得分:0)
我不打扰拆分字符串,因为无论如何你将结果写回文件。只需用空格替换所有标点符号(也可能是括号),用换行符替换所有连续的空格,然后将所有内容写回文件:
$in = 'C:\test.txt'
$out = 'C:\test2.txt'
(Get-Content $in | Out-String) -replace '[.,;:?!()]',' ' -replace '\s+',"`r`n" |
Set-Content $out