我有两个dataFrame,从中我将列的唯一值提取到a和b
a = df1.col1.unique()
b = df2.col2.unique()
现在a和b就是这样的
['a','b','c','d'] #a
[1,2,3] #b
他们现在输入numpy.ndarray
我想加入他们以获得像这样的DataFrame
col1 col2
0 a 1
1 a 2
3 a 3
4 b 1
5 b 2
6 b 3
7 c 1
. . .
有没有办法不使用循环?
答案 0 :(得分:1)
使用numpy工具:
pd.DataFrame({'col1':np.repeat(a,b.size),'col2':np.tile(b,a.size)})
答案 1 :(得分:0)
<强>更新强>
B中。使用numpy的M。解决方案要快得多 - 我建议使用他的方法:
public class SimulatedAnnealing {
// Calculate the acceptance probability
public static double acceptanceProbability(int energy, int newEnergy, double temperature) {
// If the new solution is better, accept it
if (newEnergy < energy) {
return 1.0;
}
// If the new solution is worse, calculate an acceptance probability
return Math.exp((energy - newEnergy) / temperature);
}
public static void main(String[] args) {
// Create and add our cities
City city = new City(60, 200);
TourManager.addCity(city);
City city2 = new City(180, 200);
TourManager.addCity(city2);
City city3 = new City(80, 180);
TourManager.addCity(city3);
City city4 = new City(140, 180);
TourManager.addCity(city4);
City city5 = new City(20, 160);
// Set initial temp
double temp = 10000;
// Cooling rate
double coolingRate = 0.003;
// Initialize intial solution
Tour currentSolution = new Tour();
currentSolution.generateIndividual();
System.out.println("Initial solution distance: " + currentSolution.getDistance());
// Set as current best
Tour best = new Tour(currentSolution.getTour());
// Loop until system has cooled
while (temp > 1) {
// Create new neighbour tour
Tour newSolution = new Tour(currentSolution.getTour());
// Get a random positions in the tour
int tourPos1 = (int) (newSolution.tourSize() * Math.random());
int tourPos2 = (int) (newSolution.tourSize() * Math.random());
// Get the cities at selected positions in the tour
City citySwap1 = newSolution.getCity(tourPos1);
City citySwap2 = newSolution.getCity(tourPos2);
// Swap them
newSolution.setCity(tourPos2, citySwap1);
newSolution.setCity(tourPos1, citySwap2);
// Get energy of solutions
int currentEnergy = currentSolution.getDistance();
int neighbourEnergy = newSolution.getDistance();
// Decide if we should accept the neighbour
if (acceptanceProbability(currentEnergy, neighbourEnergy, temp) > Math.random()) {
currentSolution = new Tour(newSolution.getTour());
}
// Keep track of the best solution found
if (currentSolution.getDistance() < best.getDistance()) {
best = new Tour(currentSolution.getTour());
}
// Cool system
temp *= 1-coolingRate;
}
System.out.println("Final solution distance: " + best.getDistance());
System.out.println("Tour: " + best);
}
}
{{1}}
答案 2 :(得分:0)
如果不使用至少一个for循环,则无法执行此任务。您可以做的最好的事情是隐藏 for循环或使用隐式yield
调用来创建一个内存效率高的生成器。
itertools
为此任务导出高效函数,隐式使用yield
返回生成器:
from itertools import product
products = product(['a','b','c','d'], [1,2,3])
col1_items, col2_items = zip(*products)
result = pandas.DataFrame({'col1':col1_items, 'col2': col2_items})
itertools.product
创建两个可迭代的Cartesian product。 zip(*products)
只是将生成的元组列表解压缩为两个单独的元组,如here所示。
答案 3 :(得分:0)
你可以用pandas merge来做到这一点,它会比itertools或循环更快:
df_a = pd.DataFrame({'a': a, 'key': 1})
df_b = pd.DataFrame({'b': b, 'key': 1})
result = pd.merge(df_a, df_b, how='outer')
结果:
a key b
0 a 1 1
1 a 1 2
2 a 1 3
3 b 1 1
4 b 1 2
5 b 1 3
6 c 1 1
7 c 1 2
8 c 1 3
9 d 1 1
10 d 1 2
11 d 1 3
然后如果需要你可以随时做
del result['key']