Question

我有一个包含2000万个名字（平均6个字符）的列表，每行一个，保存在.csv或txt文件中。

有些名字出现两次以上，其他名称只出现一次。我只希望那些出现次数超过x次的人重新保存到另一个名称列表中，并且每个名称旁边的文件中出现的总次数，那些看起来少于x的名称被删除。

（我知道这不会有趣，但是我一直在努力做到这一点）： - 使用.bat和powershell，因为这是我所知道的全部基本上我的代码检查了每一行并尝试将其分别保存到%variable%或$variable - 如果变量存在，则为et %(name)(number)%+1并保存到变量%name%%namenumber% nextline（或使用get-content检查所有内容） - 当它完成每个％名称％后面打印%namenumber）以显示金额

我意识到这是延迟并尝试使用数组，但我无法让它工作。

我不编码，但我认为这个谜题很有趣。这是第一个小时左右，但我有工作要做与此相关的事情，我对这种语言知之甚少

Answer 1

好的，我们走了。首先，此过程完成完全所需的时间取决于唯一名称的数量，因此在使用真实数据文件运行程序之前，无法预测哪种方法会更快。< / p>

首先，最简单的方法：

type dialOptions struct {
unaryInt    UnaryClientInterceptor
streamInt   StreamClientInterceptor
...
...
...

copts       transport.ConnectOptions
}


type ConnectOptions struct {
    // UserAgent is the application user agent.
    UserAgent string
...
...
}

如果以前的程序需要很长时间才能完成，您可以在＆＃34;累积名称＆＃34;中添加@echo off setlocal set "amount=4" rem Accumulate the names for /F %%a in (theFile.txt) do set /A "$%%a+=1" rem Show names with more than the given amount of times for /F "tokens=1,2 delims=$=" %%a do ( if %%b gtr %amount% echo %%a: %%b )命令。循环，以给出程序仍在运行的可视指示...

接下来，如果具有给定次数或更大次数的唯一名称的数量很大，则该方法更有效。此方法不显示每个名称在文件中出现的确切次数。

set /P "=."

下一个修改显示了每个名称在文件中出现的次数，但运行速度比以前的版本慢。

@echo off
setlocal

set "amount=4"

rem Accumulate just the names, not the count
for /F %%a in (theFile.txt) do (
   if not exist %%a.tmp (
      set /A "$%%a+=1"
      if !$%%a! gtr %amount% (
         > %%a.tmp echo !$%%1!
         set "$%%a="
      )
   )
)

rem Show names with more than the given amount of times
dir /B *.tmp

del *.tmp

您如何阅读和计算代码为

1 个答案: