Question

我有一个类似

的文本文件

Country1
city1
city2

Country2
city3
city4

我想分开国家和城市。这样做有什么快捷的方法吗？我正在考虑一些文件处理然后提取到不同的文件，这是最好的方法还是可以快速完成一些正则表达式？

Answer 1

countries=[]
cities=[]
with open("countries.txt") as f:
    gap=True
    for line in f:
        line=line.strip()
        if gap:
            countries.append(line)
            gap=False
        elif line=="":
            gap=True
        else:
            cities.append(line)
print countries
print cities

输出：

['Country1', 'Country2']
['city1', 'city2', 'city3', 'city4']

如果你想将这些写入文件：

with open("countries.txt","w") as country_file, open("cities.txt","w") as city_file:
    country_file.write("\n".join(countries))
    city_file.write("\n".join(cities))

Answer 2

f = open('b.txt', 'r')
status = True
country = []
city = []
for line in f:
    line = line.strip('\n').strip()
    if line:
        if status:
            country.append(line)
            status = False
        else:
            city.append(line)
    else:
        status = True

print country
print city


output :

>>['city1', 'city2', 'city3', 'city4']
>>['Country1', 'Country2']

Answer 3

根据文件的规则，python中可能很简单：

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    cityLines = [next(fh) for _i in range(2)]

    # read a blank line to advance countries.
    next(fh)

这可能不完全正确，因为我想很多国家都有不同数量的城市。您可以像这样修改它来解决：

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    # we assume here that each country has at least 1 city.
      cities = [next(fh).strip()]

      while cities[-1]: # will continue until we encounter a blank line.
        cities.append(next(fh).strip())

这没有做任何事情将数据放入输出文件，或者将它存储在文件句柄本身之外，但它是一个开始。你真的应该为你的问题选择一种语言。很多时候直到

Answer 4

$countries = array();
$cities = array();
$gap = false;
$file = file('path/to/file');
foreach($file as $line)
{
  if($line == '') $gap = true;
  elseif ($line != '' and $gap) 
  {
    $countries[] = $line;
    $gap = false;
  }
  elseif ($line != '' and !$gap) $cities[] = $line;
}

Answer 5

不确定这会有所帮助，但您可以尝试使用以下代码获取字典然后使用它（写入文件，比较等）：

res = {}
with open('c:\\tst.txt') as f:
    lines = f.readlines()
    for i,line in enumerate(lines):
        line = line.strip()
        if (i == 0 and line):
            key = line
            res[key] = []
        elif not line and i+1 < len(lines):
            key = lines[i+1].strip()
            res[key] = []
        elif line and line != key:
            res[key].append(line)
print res

Answer 6

另一个不读取数组中整个文件的PHP示例。

<?php

$fh = fopen('countries.txt', 'r');

$countries = array();
$cities = array();

while ( $data = fgets($fh) )
{
  // If $country is empty (or not defined), the this line is a country.
  if ( ! isset($country) )
  {
    $country = trim($data);
    $countries[] = $country;
  }
  // If an empty line is found, unset $country.
  elseif ( ! trim($data) )
    unset($country);
  // City
  else
    $cities[$country][] = trim($data);
}

fclose($fh);

$countries数组将包含国家/地区列表，而$cities数组将包含按国家/地区列出的城市列表。

Answer 7

是否存在一些区分国家和城市的模式？或者，空行后的第一行是国家，所有后续行是城市名称，直到下一个空白行？或者你是根据查找表找到国家（Python中的“字典”; PHP中的关联数组; Perl中的哈希值 - 包括所有官方认可的国家的哈希）？

假设没有名字与任何国家相冲突的城市，这是否安全？有法国，爱荷华州，美国，还是老美国，日本？

将它们分开后，您想对这些做什么？你提到“一些文件处理然后提取到不同的文件”---你是否想到每个国家的一个文件，其中包含所有城市的列表？或者每个国家/地区一个目录和每个城市一个文件？

显而易见的方法是逐行遍历文件，并维护一个小状态机：空（文件的开头，国家/地区之间的空行？），在此期间您进入“国家/地区”状态（每当您找到时）任何符合条件的模式都意味着您遇到了国家/地区的名称。找到国家/地区名称后，您就会进入城市加载状态。我会创建一个字典，使用国家名称作为键，城市集合作为城市（尽管在某个国家/地区有多个同名城市的情况下，您可能真的需要县/省，城市名称元组：波特兰，缅因州与波特兰，例如俄勒冈州）。如果您的文件内容导致某种歧义（在您确定国家/地区之前的城市名称，连续的两个国家/地区名称，等等），您也可能会出现“错误”状态。

鉴于您的规范有多模糊，很难建议一个好的代码片段。这是。

Answer 8

这个正则表达式适用于你的例子：

/(?:^|\r\r)(.+?)\r(.+?)(?=\r\r|$)/s

捕获第1组中的国家和第2组中的城市。您可能需要调整换行符，具体取决于您的系统。它们可以是\ n，\ r或\ r \ n。编辑：添加了一个$符号，因此最后不需要两个换行符。你需要dotall的标志才能使正则表达式按预期工作。

Answer 9

使用awk打印字段1 - 国家/地区

awk 'BEGIN {RS="";FS="\n"} {print $1 > "countries"} {for (i=2;i<=NF;i++) print $i > "cities"}' source.txt

在新行之后提取行

9 个答案: