Question

我正在尝试替换pandas数据框中某些列中的值。由于需要进行许多更改，因此我将使用for循环进行处理（尽管我不愿意为此作答）。我只是从python开始，如果这很明显，那么就道歉了–我找不到能解决它的任何东西。

说我有一个像这样的数据框：

import pandas as pd

weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])

在天气数据未知的地方，我想使用注释中的文本来填写数据。例如，如果它说“ rain”，那么我希望天气值是“ rain”，假设以前未知。

我已经尝试过：

weather_text = ["rain", "sun"]
weather_label = ["rainy", "sunny"]

for i in range(len(weather_text)):
    weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) & 
               weather['weather'].str.contains("unknown")] = weather_label[i]

这会将符合条件的行中的每个值更改为weather_label中的任何值。我可以看到为什么这样做，但是我不确定如何仅更改相关列。我已经尝试过：

for i in range(len(weather_text)):
    weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) & 
               weather['weather']str.contains("unknown")]
    weather['weather'] = weather_label[i]

但是随后将值更改为weather_label列表中的最后一个值，而不是位于相同索引位置的那个值。

在我的真实数据中，有更多的模式和值组合，因此我不希望单独运行每个组合。

有人可以帮忙吗？

Answer 1

这就是我要做的。我在此代码中使用了numpy ...希望没问题。我非常喜欢numpy的vectorize方法。熊猫有一个等同物，但我不倾向于使用它。 vectorize方法（见代码的最后一行）是针对这样的情况而设计的，即您想对整列进行某些操作，但是它不需要您在代码中指定循环（它完成了循环）为您提供幕后花絮）。

import pandas as pd
import numpy as np

weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])

weather_indicators = {'rain': 'rainy',
                      'drizzle': 'rainy',
                      'sun': 'sunny',
                      'bright': 'sunny',
                      # add each pattern to this dictionary
                      }

def determine_weather(weather, weather_note):
    output = weather
    if weather == 'unknown':
        for indicator in weather_indicators:
            if indicator in weather_note:
                output = weather_indicators[indicator]
    return output


weather['weather'] = np.vectorize(determine_weather)(weather['weather'], weather['weather_note'])

我使用名为weather_indicators的字典对象来存储模式。您可以为其添加更多模式。如果模式的数量非常大（例如数百个），则可以考虑将它们存储在数据库表或csv文件之类的其他对象中，然后将其读取到代码中。显然，此时您必须重新处理上面的代码，因为这超出了您的问题范围。

但是基本上我创建了一个函数，该函数寻找某个指示词（例如“ rain”），如果该词在weather_note值中，则将weather列设置为指定值来自weather_indicator字典对象。然后使用numpy的vectorize函数将函数应用于数据框的weather列。

Answer 2

如果天气值“未知”，则从weather_note中分配值。使用df.replace将“ sun”一词替换为“ sunny”。

weather.loc[weather['weather'] == 'unknown', 'weather'] = weather['weather_note']
weather['weather'].replace('sun', 'sunny', inplace = True)

    weather weather_note
0   rainy   rainy
1   sunny   sun
2   rainy   not sunny at all
3   stormy  a lot of rain

有条件地更改列值并重复几次

2 个答案: