Home [Project] Travelling salesman problem
Post
Cancel

[Project] Travelling salesman problem

Description

This is a final report for analysis of solving the travelling salesman problem using multiple algorithms and the analysis of those algorithms.

Algorithms that Solve TSP

Brute Force Search (Naive Algorithm)

Run-time Analysis

\[O(n!)\]

Pseudo-code

Description

This algorithm checks every vertices and every edges. Therefore, it is clear it will find the solution. The running time is $O(n!)$ because the starting vertex has $n-1$ edges to choose, next one has $n-2$ edges to choose, $… n-1$ vertex has $1$ edge to choose. Therefore, by multiply every edges, Icould get the running time which is $(n-1)!$. Since it is clear $(n-1)! < n!$, I can say the running time is $O(n!)$.I choose this algorithm because it is the basic way to think about how to solve this problem. This algorithm is easy to make, but it is really slow. Therefore, this algorithm makes us to realize how important the algorithm is. Better algorithm makes way more faster program.

Held-Karp Algorithm (Dynamic Algorithm)

Run-time Analysis

\[O(n^{2} \times 2^{n})\]

Pseudo-code

Description

This algorithm use dynamic algorithm to solve the problem. It first gets the distance between two vertices and save it to array. It is $O(n)$ steps. The next part is to find the minimum distance with small set of vertices. The for loop, for $s$ from $2$ to $n-1$ and for all $S$ in ${2, …, n}$, $|S| = s$, is defining small set of vertices. The next for loop is the part that getting the minimum distance from the set of vertices. Each set has s elements and it has to compare with s different minimum candidates, it has $O(s^{2})$ running time. The first two loop, which define the set S, has running time with $SUM_{i}=2^{n-1} (C(n-1,i))$ because there are $C(n-1, i)$ number of combination for defining set with $i$ elements, and since it is from $2$ to $n-1$, it makes $SUM_{i}=2^{n-1} (C(n-1,i)) \times n^{2})$. Hence, it could be rewrite as $n^{2} \times SUM_{i}=2^{n-1} (C(n-1,i))$ $<= n^{2} \times 2^{n-1} = \frac{1}{2} \times n^{2} \times 2^{n}$ $= O(2^{n} \times n^{2}$). I choose this algorithm because I learned dynamic programming and this algorithm use that concepts. This algorithm saves data into $C(S, n)$ and use it to find minimum distance. By this skill, this algorithm can find real answer with way more faster than the brute force method. However, it is still slow.

K-Nearest Neighbor Algorithm (Approximation Algorithm)

Run-time Analysis

\[O(n^{2})\]

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def Distance(a,b):
    return int(round(math.sqrt((math.pow(a['i'] - b['i'],2))+(math.pow(a['j'] - b['j'],2)))))

def KNN(cities, inFile):
    matrix = [[-1 for x in range(len(cities))] for y in range(len(cities))]
    minLength = sys.maxsize
    order = []
    for x in range(len(cities)):
        allCities = [z for z in cities]
        route = []
        route.append(allCities[x]['city'])
        allCities.remove(allCities[x])
        length = 0
        while len(allCities) > 0:
            current = cities[route[len(route)-1]]
            minDistance = sys.maxsize
            minCity = -1
            for y in range(len(allCities)):
                currentDistance = matrix[current['city']][allCities[y]['city']]
                if currentDistance == -1:
                    currentDistance = Distance(current, allCities[y])
                    matrix[current['city']][allCities[y]['city']] = currentDistance
                    matrix[allCities[y]['city']][current['city']] = currentDistance
                if currentDistance < minDistance:
                    minDistance = currentDistance
                    minCity = allCities[y]
            route.append(minCity['city'])
            allCities.remove(minCity)
            length += minDistance
        currentDistance = matrix[cities[route[0]]['city']][cities[route[len(route)-1]]['city']]
        if currentDistance == -1:
            currentDistance = Distance(cities[route[0]], cities[route[len(route)-1]])
            matrix[cities[route[0]]['city']][cities[route[len(route)-1]]['city']] = currentDistance
            matrix[cities[route[len(route)-1]]['city']][cities[route[0]]['city']] = currentDistance
        length += currentDistance
        if length < minLength:
            minLength = length
            order = [x for x in route]
            WriteFileKNN(inFile, order, minLength)

Description

This algorithm is greedy algorithm that finds the vertex with minimum weight and choose it. Therefore, the first vertex has to check $n-1$ edges and choose the minimum. Second one has to check n-2 edges and choose the minimum. The $n-1$ vertex choose $1$ edges. Hence, the running time is $SUM\ 1\ TO\ N-1\ = (N-1)(N-2)/2 = O(N^2)$. I choose this algorithm because I found that this is the fastest. This algorithm is greedy algorithm that choose only the nearest vertex from the start vertex, and keep finding the nearest vertex from the chosen one. Since it is just checking the nearest, there is possibility to find the wrong answer. However, I found that this algorithm finds reasonable path, which means not huge different than true answer, and the power of this algorithm is this is super fast. It is only $O(n^2)$. Therefore, this algorithm will be good choice when I need the result really quickly, and it is ok to have small error. I used a nearest neighbor algorithm written in Java to get inspiration for my Python program, found here: http://www.sanfoundry.com/java-program-implement-traveling-salesman-problem-using-nearest-neighbour-algorithm/. The program searches for the most optimal nearest neighbor route by using a brute-force wrapper for-loop to select each city as the starting point. The program is able to analyze each city for the smaller list sizes, but for the larger ones it cannot process each city within 5 minutes, so that is why I have a section of code in the main section of the program to end the algorithm after 5 minutes. Each time the algorithm finds a new shorter route it is written to the output file, so this ensures that I am still able to have a valid result at the very beginning, well before the 5 minute time constraint is over.

Genetic Algorithm (Approximation Algorithm)

Run-time Analysis

\[O(T \times n_{0} \times n^{2})\]

where $T$ is the number of outer iteration, $n_{0}$ is the initial size of the population, and $n$ is the number of cities.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class City:
    def __init__(self, id, x, y):
        self.id = id
        self.x = x
        self.y = y
    
    def distance(self, city):
        # xDis = self.x - city.x
        # yDis = self.y - city.y
        # distance = np.sqrt((xDis ** 2) + (yDis ** 2))
        return int(round(math.sqrt((math.pow(self.x - city.x,2))+(math.pow(self.y - city.y,2)))))
    
    def __repr__(self):
        return "(" + str(self.x) + "," + str(self.y) + ")"

class Fitness:
    def __init__(self, route):
        self.route = route
        self.distance = 0
        self.fitness= 0.0
    
    def routeDistance(self):
        if self.distance ==0:
            pathDistance = 0
            for i in range(0, len(self.route)):
                fromCity = self.route[i]
                toCity = None
                if i + 1 < len(self.route):
                    toCity = self.route[i + 1]
                else:
                    toCity = self.route[0]
                pathDistance += fromCity.distance(toCity)
            self.distance = pathDistance
        return self.distance
    
    def routeFitness(self):
        if self.fitness == 0:
            self.fitness = 1 / float(self.routeDistance())
        return self.fitness
def createRoute(cityList):
    route = random.sample(cityList, len(cityList))
    return route
def initialPopulation(popSize, cityList):
    population = []

    for i in range(0, popSize):
        population.append(createRoute(cityList))
    return population
def rankRoutes(population):
    fitnessResults = {}
    for i in range(0,len(population)):
        fitnessResults[i] = Fitness(population[i]).routeFitness()
    return sorted(fitnessResults.items(), key = operator.itemgetter(1), reverse = True)
def selection(popRanked, eliteSize):
    selectionResults = []
    df = pd.DataFrame(np.array(popRanked), columns=["Index","Fitness"])
    df['cum_sum'] = df.Fitness.cumsum()
    df['cum_perc'] = 100*df.cum_sum/df.Fitness.sum()
    
    for i in range(0, eliteSize):
        selectionResults.append(popRanked[i][0])
    for i in range(0, len(popRanked) - eliteSize):
        pick = 100*random.random()
        for i in range(0, len(popRanked)):
            if pick <= df.iat[i,3]:
                selectionResults.append(popRanked[i][0])
                break
    return selectionResults
def matingPool(population, selectionResults):
    matingpool = []
    for i in range(0, len(selectionResults)):
        index = selectionResults[i]
        matingpool.append(population[index])
    return matingpool
def breed(parent1, parent2):
    child = []
    childP1 = []
    childP2 = []
    
    geneA = int(random.random() * len(parent1))
    geneB = int(random.random() * len(parent1))
    
    startGene = min(geneA, geneB)
    endGene = max(geneA, geneB)

    for i in range(startGene, endGene):
        childP1.append(parent1[i])
        
    childP2 = [item for item in parent2 if item not in childP1]

    child = childP1 + childP2
    return child
def breedPopulation(matingpool, eliteSize):
    children = []
    length = len(matingpool) - eliteSize
    pool = random.sample(matingpool, len(matingpool))

    for i in range(0,eliteSize):
        children.append(matingpool[i])
    
    for i in range(0, length):
        child = breed(pool[i], pool[len(matingpool)-i-1])
        children.append(child)
    return children
def mutate(individual, mutationRate):
    for swapped in range(len(individual)):
        if(random.random() < mutationRate):
            swapWith = int(random.random() * len(individual))
            
            city1 = individual[swapped]
            city2 = individual[swapWith]
            
            individual[swapped] = city2
            individual[swapWith] = city1
    return individual
def mutatePopulation(population, mutationRate):
    mutatedPop = []
    
    for ind in range(0, len(population)):
        mutatedInd = mutate(population[ind], mutationRate)
        mutatedPop.append(mutatedInd)
    return mutatedPop
def nextGeneration(currentGen, eliteSize, mutationRate):
    popRanked = rankRoutes(currentGen)
    selectionResults = selection(popRanked, eliteSize)
    matingpool = matingPool(currentGen, selectionResults)
    children = breedPopulation(matingpool, eliteSize)
    nextGeneration = mutatePopulation(children, mutationRate)
    return nextGeneration

def GA(inFile, population, popSize, eliteSize, mutationRate, generations):
    pop = initialPopulation(popSize, population)
    # print("Initial distance: " + str(1 / rankRoutes(pop)[0][1]))
    
    for i in range(0, generations):
        pop = nextGeneration(pop, eliteSize, mutationRate)
    
    # print("Final distance: " + str(1 / rankRoutes(pop)[0][1]))
    bestRouteIndex = rankRoutes(pop)[0][0]
    bestRoute = pop[bestRouteIndex]
    WriteFileGA(inFile, bestRoute, 1 / rankRoutes(pop)[0][1])

Description

The genetic algorithm seems to solve different kind of problems pretty well, such as error detection, so I have thought to give it a try to solve the TSP. The implementation of the algorithm is extremely hard compared to KNN, so I need to refer to some online resource to build the algorithm on Python, the resource is found: https://towardsdatascience.com/evolution-of-a-salesman-a-complete-genetic-algorithm-tutorial-for-python-6fe5d2b3ca35. The algorithm starts with a population, a set of solutions, and every population another new population is generated to give better solutions, so as the depth of the population increases the algorithm should give better solutions in theory.

Testing

In order to test that my program was actually producing valid results we used the supplied tsp-verifier.py program on each of my output routes. Each result was found to be valid.

It seems that the K-Nearest Neighbor (KNN) algorithm give better results in matter of approximation of path took and speed compared to Genetic Algorithm (GA) in all cases when time is the limitation. However, as the complexity of the graph increases and there isn’t no time limits, the Genetic Algorithm can finish faster with higher approximation to the optimal path when the threshold to stop is set, compared to the KNN which takes almost 2 times longer on average to reach the same optimal path results.

My best results when testing where Test 1, 2, 4, and 5 when using KNN, and GA failed in most cases besides Test 6 and 7, which gave the same results as KNN at the same time.

Please view the next page to look into all of the results.

 Optimal path lengthPredicted path lengthRatio
KNN1081591309211.21
GA1081593375413.12
 Optimal path lengthPredicted path lengthRatio
KNN257929751.15
GA25792763910.71
 Optimal path lengthPredicted path lengthRatio
KNN157308419369411.23
GA157308419342001.22
 Predicted path lengthTime took
KNN59110.014127969741821289
GA1048814.512681007385254}
 Predicted path lengthTime took
KNN80110.11811113357543945
GA3283718.30189299583435
 Predicted path lengthTime took
KNN148261.3867180347442627
GA10338734.44625997543335
 Predicted path lengthTime took
KNN1971112.897594213485718
GA22072773.32604813575745
 Predicted path lengthTime took
KNN27128123.83275985717773
GA465770200.42861986160278
 Predicted path lengthTime took
KNN39834299.0716700553894
GA39834299.00313663482666
 Predicted path lengthTime took
KNN62110299.6045079231262
GA62110299.00823521614075
This post is licensed under CC BY 4.0 by the author.

Linux vs. FreeBSD vs. Windows

How I Learn

Comments powered by Disqus.