将唯一值加入新数据框架(python,pandas)

时间:2016-04-20 19:49:48

标签: python numpy pandas dataframe unique

我有两个dataFrame,从中我将列的唯一值提取到a和b

a = df1.col1.unique()
b = df2.col2.unique()

现在a和b就是这样的

['a','b','c','d'] #a
[1,2,3] #b

他们现在输入numpy.ndarray

我想加入他们以获得像这样的DataFrame

   col1  col2
0    a     1
1    a     2
3    a     3
4    b     1
5    b     2
6    b     3
7    c     1
   . . .

有没有办法不使用循环?

4 个答案:

答案 0 :(得分:1)

使用numpy工具:

pd.DataFrame({'col1':np.repeat(a,b.size),'col2':np.tile(b,a.size)})

答案 1 :(得分:0)

<强>更新

B中。使用numpy的M。解决方案要快得多 - 我建议使用他的方法:

public class SimulatedAnnealing {

// Calculate the acceptance probability
public static double acceptanceProbability(int energy, int newEnergy, double temperature) {
    // If the new solution is better, accept it
    if (newEnergy < energy) {
        return 1.0;
    }
    // If the new solution is worse, calculate an acceptance probability
    return Math.exp((energy - newEnergy) / temperature);
}

public static void main(String[] args) {
    // Create and add our cities
    City city = new City(60, 200);
    TourManager.addCity(city);
    City city2 = new City(180, 200);
    TourManager.addCity(city2);
    City city3 = new City(80, 180);
    TourManager.addCity(city3);
    City city4 = new City(140, 180);
    TourManager.addCity(city4);
    City city5 = new City(20, 160);


    // Set initial temp
    double temp = 10000;

    // Cooling rate
    double coolingRate = 0.003;

    // Initialize intial solution
    Tour currentSolution = new Tour();
    currentSolution.generateIndividual();

    System.out.println("Initial solution distance: " + currentSolution.getDistance());

    // Set as current best
    Tour best = new Tour(currentSolution.getTour());

    // Loop until system has cooled
    while (temp > 1) {
        // Create new neighbour tour
        Tour newSolution = new Tour(currentSolution.getTour());

        // Get a random positions in the tour
        int tourPos1 = (int) (newSolution.tourSize() * Math.random());
        int tourPos2 = (int) (newSolution.tourSize() * Math.random());

        // Get the cities at selected positions in the tour
        City citySwap1 = newSolution.getCity(tourPos1);
        City citySwap2 = newSolution.getCity(tourPos2);

        // Swap them
        newSolution.setCity(tourPos2, citySwap1);
        newSolution.setCity(tourPos1, citySwap2);

        // Get energy of solutions
        int currentEnergy = currentSolution.getDistance();
        int neighbourEnergy = newSolution.getDistance();

        // Decide if we should accept the neighbour
        if (acceptanceProbability(currentEnergy, neighbourEnergy, temp) > Math.random()) {
            currentSolution = new Tour(newSolution.getTour());
        }

        // Keep track of the best solution found
        if (currentSolution.getDistance() < best.getDistance()) {
            best = new Tour(currentSolution.getTour());
        }

        // Cool system
        temp *= 1-coolingRate;
    }

    System.out.println("Final solution distance: " + best.getDistance());
    System.out.println("Tour: " + best);
  }
}

尝试itertools.product

{{1}}

答案 2 :(得分:0)

如果不使用至少一个for循环,则无法执行此任务。您可以做的最好的事情是隐藏 for循环或使用隐式yield调用来创建一个内存效率高的生成器。

itertools为此任务导出高效函数,隐式使用yield返回生成器:

from itertools import product

products = product(['a','b','c','d'], [1,2,3])

col1_items, col2_items = zip(*products)

result = pandas.DataFrame({'col1':col1_items, 'col2': col2_items})

itertools.product创建两个可迭代的Cartesian productzip(*products)只是将生成的元组列表解压缩为两个单独的元组,如here所示。

答案 3 :(得分:0)

你可以用pandas merge来做到这一点,它会比itertools或循环更快:

df_a = pd.DataFrame({'a': a, 'key': 1})
df_b = pd.DataFrame({'b': b, 'key': 1})
result = pd.merge(df_a, df_b, how='outer')

结果:

    a  key  b
0   a    1  1
1   a    1  2
2   a    1  3
3   b    1  1
4   b    1  2
5   b    1  3
6   c    1  1
7   c    1  2
8   c    1  3
9   d    1  1
10  d    1  2
11  d    1  3

然后如果需要你可以随时做

del result['key']