[Project] Travelling salesman problem
Post
Cancel

# [Project] Travelling salesman problem

## Description

This is a final report for analysis of solving the travelling salesman problem using multiple algorithms and the analysis of those algorithms.

## Algorithms that Solve TSP

### Brute Force Search (Naive Algorithm)

#### Run-time Analysis

$O(n!)$

#### Description

This algorithm checks every vertices and every edges. Therefore, it is clear it will find the solution. The running time is $O(n!)$ because the starting vertex has $n-1$ edges to choose, next one has $n-2$ edges to choose, $… n-1$ vertex has $1$ edge to choose. Therefore, by multiply every edges, Icould get the running time which is $(n-1)!$. Since it is clear $(n-1)! < n!$, I can say the running time is $O(n!)$.I choose this algorithm because it is the basic way to think about how to solve this problem. This algorithm is easy to make, but it is really slow. Therefore, this algorithm makes us to realize how important the algorithm is. Better algorithm makes way more faster program.

### Held-Karp Algorithm (Dynamic Algorithm)

#### Run-time Analysis

$O(n^{2} \times 2^{n})$

#### Description

This algorithm use dynamic algorithm to solve the problem. It first gets the distance between two vertices and save it to array. It is $O(n)$ steps. The next part is to find the minimum distance with small set of vertices. The for loop, for $s$ from $2$ to $n-1$ and for all $S$ in ${2, …, n}$, $|S| = s$, is defining small set of vertices. The next for loop is the part that getting the minimum distance from the set of vertices. Each set has s elements and it has to compare with s different minimum candidates, it has $O(s^{2})$ running time. The first two loop, which define the set S, has running time with $SUM_{i}=2^{n-1} (C(n-1,i))$ because there are $C(n-1, i)$ number of combination for defining set with $i$ elements, and since it is from $2$ to $n-1$, it makes $SUM_{i}=2^{n-1} (C(n-1,i)) \times n^{2})$. Hence, it could be rewrite as $n^{2} \times SUM_{i}=2^{n-1} (C(n-1,i))$ $<= n^{2} \times 2^{n-1} = \frac{1}{2} \times n^{2} \times 2^{n}$ $= O(2^{n} \times n^{2}$). I choose this algorithm because I learned dynamic programming and this algorithm use that concepts. This algorithm saves data into $C(S, n)$ and use it to find minimum distance. By this skill, this algorithm can find real answer with way more faster than the brute force method. However, it is still slow.

### K-Nearest Neighbor Algorithm (Approximation Algorithm)

#### Run-time Analysis

$O(n^{2})$

#### Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 def Distance(a,b): return int(round(math.sqrt((math.pow(a['i'] - b['i'],2))+(math.pow(a['j'] - b['j'],2))))) def KNN(cities, inFile): matrix = [[-1 for x in range(len(cities))] for y in range(len(cities))] minLength = sys.maxsize order = [] for x in range(len(cities)): allCities = [z for z in cities] route = [] route.append(allCities[x]['city']) allCities.remove(allCities[x]) length = 0 while len(allCities) > 0: current = cities[route[len(route)-1]] minDistance = sys.maxsize minCity = -1 for y in range(len(allCities)): currentDistance = matrix[current['city']][allCities[y]['city']] if currentDistance == -1: currentDistance = Distance(current, allCities[y]) matrix[current['city']][allCities[y]['city']] = currentDistance matrix[allCities[y]['city']][current['city']] = currentDistance if currentDistance < minDistance: minDistance = currentDistance minCity = allCities[y] route.append(minCity['city']) allCities.remove(minCity) length += minDistance currentDistance = matrix[cities[route[0]]['city']][cities[route[len(route)-1]]['city']] if currentDistance == -1: currentDistance = Distance(cities[route[0]], cities[route[len(route)-1]]) matrix[cities[route[0]]['city']][cities[route[len(route)-1]]['city']] = currentDistance matrix[cities[route[len(route)-1]]['city']][cities[route[0]]['city']] = currentDistance length += currentDistance if length < minLength: minLength = length order = [x for x in route] WriteFileKNN(inFile, order, minLength) 

#### Description

This algorithm is greedy algorithm that finds the vertex with minimum weight and choose it. Therefore, the first vertex has to check $n-1$ edges and choose the minimum. Second one has to check n-2 edges and choose the minimum. The $n-1$ vertex choose $1$ edges. Hence, the running time is $SUM\ 1\ TO\ N-1\ = (N-1)(N-2)/2 = O(N^2)$. I choose this algorithm because I found that this is the fastest. This algorithm is greedy algorithm that choose only the nearest vertex from the start vertex, and keep finding the nearest vertex from the chosen one. Since it is just checking the nearest, there is possibility to find the wrong answer. However, I found that this algorithm finds reasonable path, which means not huge different than true answer, and the power of this algorithm is this is super fast. It is only $O(n^2)$. Therefore, this algorithm will be good choice when I need the result really quickly, and it is ok to have small error. I used a nearest neighbor algorithm written in Java to get inspiration for my Python program, found here: http://www.sanfoundry.com/java-program-implement-traveling-salesman-problem-using-nearest-neighbour-algorithm/. The program searches for the most optimal nearest neighbor route by using a brute-force wrapper for-loop to select each city as the starting point. The program is able to analyze each city for the smaller list sizes, but for the larger ones it cannot process each city within 5 minutes, so that is why I have a section of code in the main section of the program to end the algorithm after 5 minutes. Each time the algorithm finds a new shorter route it is written to the output file, so this ensures that I am still able to have a valid result at the very beginning, well before the 5 minute time constraint is over.

### Genetic Algorithm (Approximation Algorithm)

#### Run-time Analysis

$O(T \times n_{0} \times n^{2})$

where $T$ is the number of outer iteration, $n_{0}$ is the initial size of the population, and $n$ is the number of cities.

#### Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 class City: def __init__(self, id, x, y): self.id = id self.x = x self.y = y def distance(self, city): # xDis = self.x - city.x # yDis = self.y - city.y # distance = np.sqrt((xDis ** 2) + (yDis ** 2)) return int(round(math.sqrt((math.pow(self.x - city.x,2))+(math.pow(self.y - city.y,2))))) def __repr__(self): return "(" + str(self.x) + "," + str(self.y) + ")" class Fitness: def __init__(self, route): self.route = route self.distance = 0 self.fitness= 0.0 def routeDistance(self): if self.distance ==0: pathDistance = 0 for i in range(0, len(self.route)): fromCity = self.route[i] toCity = None if i + 1 < len(self.route): toCity = self.route[i + 1] else: toCity = self.route[0] pathDistance += fromCity.distance(toCity) self.distance = pathDistance return self.distance def routeFitness(self): if self.fitness == 0: self.fitness = 1 / float(self.routeDistance()) return self.fitness def createRoute(cityList): route = random.sample(cityList, len(cityList)) return route def initialPopulation(popSize, cityList): population = [] for i in range(0, popSize): population.append(createRoute(cityList)) return population def rankRoutes(population): fitnessResults = {} for i in range(0,len(population)): fitnessResults[i] = Fitness(population[i]).routeFitness() return sorted(fitnessResults.items(), key = operator.itemgetter(1), reverse = True) def selection(popRanked, eliteSize): selectionResults = [] df = pd.DataFrame(np.array(popRanked), columns=["Index","Fitness"]) df['cum_sum'] = df.Fitness.cumsum() df['cum_perc'] = 100*df.cum_sum/df.Fitness.sum() for i in range(0, eliteSize): selectionResults.append(popRanked[i][0]) for i in range(0, len(popRanked) - eliteSize): pick = 100*random.random() for i in range(0, len(popRanked)): if pick <= df.iat[i,3]: selectionResults.append(popRanked[i][0]) break return selectionResults def matingPool(population, selectionResults): matingpool = [] for i in range(0, len(selectionResults)): index = selectionResults[i] matingpool.append(population[index]) return matingpool def breed(parent1, parent2): child = [] childP1 = [] childP2 = [] geneA = int(random.random() * len(parent1)) geneB = int(random.random() * len(parent1)) startGene = min(geneA, geneB) endGene = max(geneA, geneB) for i in range(startGene, endGene): childP1.append(parent1[i]) childP2 = [item for item in parent2 if item not in childP1] child = childP1 + childP2 return child def breedPopulation(matingpool, eliteSize): children = [] length = len(matingpool) - eliteSize pool = random.sample(matingpool, len(matingpool)) for i in range(0,eliteSize): children.append(matingpool[i]) for i in range(0, length): child = breed(pool[i], pool[len(matingpool)-i-1]) children.append(child) return children def mutate(individual, mutationRate): for swapped in range(len(individual)): if(random.random() < mutationRate): swapWith = int(random.random() * len(individual)) city1 = individual[swapped] city2 = individual[swapWith] individual[swapped] = city2 individual[swapWith] = city1 return individual def mutatePopulation(population, mutationRate): mutatedPop = [] for ind in range(0, len(population)): mutatedInd = mutate(population[ind], mutationRate) mutatedPop.append(mutatedInd) return mutatedPop def nextGeneration(currentGen, eliteSize, mutationRate): popRanked = rankRoutes(currentGen) selectionResults = selection(popRanked, eliteSize) matingpool = matingPool(currentGen, selectionResults) children = breedPopulation(matingpool, eliteSize) nextGeneration = mutatePopulation(children, mutationRate) return nextGeneration def GA(inFile, population, popSize, eliteSize, mutationRate, generations): pop = initialPopulation(popSize, population) # print("Initial distance: " + str(1 / rankRoutes(pop)[0][1])) for i in range(0, generations): pop = nextGeneration(pop, eliteSize, mutationRate) # print("Final distance: " + str(1 / rankRoutes(pop)[0][1])) bestRouteIndex = rankRoutes(pop)[0][0] bestRoute = pop[bestRouteIndex] WriteFileGA(inFile, bestRoute, 1 / rankRoutes(pop)[0][1]) 

#### Description

The genetic algorithm seems to solve different kind of problems pretty well, such as error detection, so I have thought to give it a try to solve the TSP. The implementation of the algorithm is extremely hard compared to KNN, so I need to refer to some online resource to build the algorithm on Python, the resource is found: https://towardsdatascience.com/evolution-of-a-salesman-a-complete-genetic-algorithm-tutorial-for-python-6fe5d2b3ca35. The algorithm starts with a population, a set of solutions, and every population another new population is generated to give better solutions, so as the depth of the population increases the algorithm should give better solutions in theory.

## Testing

In order to test that my program was actually producing valid results we used the supplied tsp-verifier.py program on each of my output routes. Each result was found to be valid.

It seems that the K-Nearest Neighbor (KNN) algorithm give better results in matter of approximation of path took and speed compared to Genetic Algorithm (GA) in all cases when time is the limitation. However, as the complexity of the graph increases and there isn’t no time limits, the Genetic Algorithm can finish faster with higher approximation to the optimal path when the threshold to stop is set, compared to the KNN which takes almost 2 times longer on average to reach the same optimal path results.

My best results when testing where Test 1, 2, 4, and 5 when using KNN, and GA failed in most cases besides Test 6 and 7, which gave the same results as KNN at the same time.

Please view the next page to look into all of the results.

Optimal path lengthPredicted path lengthRatio
KNN1081591309211.21
GA1081593375413.12
Optimal path lengthPredicted path lengthRatio
KNN257929751.15
GA25792763910.71
Optimal path lengthPredicted path lengthRatio
KNN157308419369411.23
GA157308419342001.22
Predicted path lengthTime took
KNN59110.014127969741821289
GA1048814.512681007385254}
Predicted path lengthTime took
KNN80110.11811113357543945
GA3283718.30189299583435
Predicted path lengthTime took
KNN148261.3867180347442627
GA10338734.44625997543335
Predicted path lengthTime took
KNN1971112.897594213485718
GA22072773.32604813575745
Predicted path lengthTime took
KNN27128123.83275985717773
GA465770200.42861986160278
Predicted path lengthTime took
KNN39834299.0716700553894
GA39834299.00313663482666
Predicted path lengthTime took
KNN62110299.6045079231262
GA62110299.00823521614075