Searching and Big O Notations#

In the last chapter, we discussed that several versions of the code might exist to achieve a task. However, some or just one version of the code may be efficient. Therefore, one of the units to measure the efficiency of a code is to calculate the number of steps required to achieve a desired result. We further explored that in a given data structure, the efficiency of a data structure varies with different operations, such as reading, searching, insertion, and deletion. In this chapter, we dive a little deeper and discuss what other factors can affect the performance of a code.

Organization of data#

Let’s extend our discussion about an array data structure. Assume that an array is filled with an integer data type with random elements with no repetition of elements. Let’s choose a real-world example - an array data structure filled with the last four digits of an SSN. For simplicity reasons, assume that the array data structure is the best choice for storing the number. The figure shows the array with elements stored randomly. The user searches for a particular SSN in an array. The processor starts the search from index 0 and then continues to search until the desired element is found. If the data is the last element (worst-case scenario), the search traverses the entire array’s length. Hence, the number of steps in searching a data element of a worst-case scenario would be N, where N is the length of an array. On the contrary, inserting data into an array would require just one step. The new data appends at the end of the array and do not require any element shifting, see Fig. 3

_images/unsorted_array.png

Fig. 3 Insertion on a unsorted array#

Let’s extend the above assumption, but this time, the elements in the array are sorted. So, for example, if value 5245 is searched in an array, the search stops where the value is just greater than 5245; in this example, the search stops at index 4. This search is called a linear search.

Linear search iterates over every element of the array, looking for the search value. The search stops as soon as the element it is iterating over is greater than the search value, since we know that the search value will not be found further within the array.

The following is the pseudocode of the basic linear search algorithm.

Given a list \(L\) of \(n\) elements with values or records \(L_0\) …. \(L_{n−1}\), and target value \(T\), the following subroutine uses linear search to find the index of the target \(T\) in \(L\).

1. Set i to 0.
2. If Li = T, the search terminates successfully; return i.
3. Increase i by 1.
4. If i < n, go to step 2. Otherwise, the search terminates unsuccessfully.

Does linear search perform better when arrays are sorted over unsorted?

Unfortunately, the answer is no; under the worst-case scenario, the linear search would use the ‘N’ number of steps where ‘N’ is the number of elements in an array regardless of whether arrays are sorted or not sorted.

The worst-case scenario is the situation or input that forces an algorithm or data structure to take the most time or resources.

Furthermore, keeping the array in a sorted state before applying any search operation does require some processing. For example, inserting an integer value in an array does require comparison and shifting elements so that the new value goes into the correct place. The new number is compared with the each element in an array until the correct location is found, see Fig. 4

_images/sorted_array_insertion.png

Fig. 4 Insertion on a sorted array#

Based on the above discussion, linear search on sorted arrays does not improve performance. Furthermore, there is an overhead of sorting arrays, so why do we even bother sorting an array?

Indeed, the linear search algorithm is inefficient regardless of whether arrays are sorted or not. However, sorted arrays unlock other search algorithms, such as binary search, which works at a blazing-fast speed compared to linear search algorithms.

Big O Notations#

As discussed previously, each algorithm’s efficiency is primarily based on the number of steps it takes. The number of steps an algorithm takes is based on the worst-case and best-case scenarios, and the number of elements in an array, and so on.

To help ease communication regarding time complexity, computer scientists have borrowed a concept from the world of mathematics to describe a concise and consistent language around the efficiency of data structures and algorithms, as known as Big O Notation.

In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows.

In other words, big O notation is used to compare efficiencies between algorithms to achieve a task. It’s the letter O not the number 0. So what does the O stand for? ‘Order’, or ‘order of complexity’. For example, as discussed before that the linear search takes N steps. Therefore, in the language of big O notation, the linear search algorithm is expressed as O(N). The efficiency of each algorithm is dependent upon a scenario. I defined scenarios as worst-case, average-case, and best-case scenarios. For example, under a linear search algorithm, if the search element is the first element in the array, the complexity will be O(1). On the contrary, the complexity will be O(N) if the search element is the last element in the array. Generally speaking, the algorithm’s efficiency is gauged under the worst-case scenario.

The above is more of a conservative approach. However, if you know how inefficient an algorithm can get in a worst-case scenario, the better we are prepared for the worst and may strongly impact our choices.

O(1) algorithm complexity is independent of the number of elements in an array. For example, the complexity of inserting an element in an unsorted array is O(1) because it will take only a step to insert the element without shifting an element.

Big O tells you how the number of steps increases as the data changes.

Fig. 6 is a comparison between time complexity of \(O(n)\), \(O(log_2n)\) and \(O(1)\). The comparison suggests that an algorithm with a time complexity of \(O(log_2n)\) always performs better than \(O(n)\). Furthermore, the complexity of \(O(1)\) will always perform better regardless of n.

_images/bigocomparisons.svg

Fig. 6 Big O comparisons between O(n), O(log n), and O(1)#

\(O(1)\) example#

""" Code written in Python """
data = ["apples","oranges","grapes","pineapple"]
data.append("kiwi")
print(data[0])

In the above example, a new element is appended at the end of the list. The insertion of a new element does not require any shifting of elements. Furthermore, the data is printed at index 0.

\(O(n)\) example#

""" Code written in Python """
data = ["apples","oranges","grapes","pineapple"]
for i in data:print(i)

In the above example, a list is defined, and the for loop iterates over the list and prints the element in each iteration. Hence, the for loop will run equal to the length of the list. This is an example of \(O(n)\) complexity.

\(O(log_2n)\) example#

Conclusion#

There is no single algorithm that is perfect for every situation. For example, sorted arrays provide improvements in terms of searching an element using a binary search; however, at the cost of increasing complexity when inserting elements in the sorted array. On the contrary, if adding data is the main task, standard arrays may be a better choice due to the faster insertion rate. The Big-O notation is used to find the worst-case scenario of the time it takes an algorithm to do its job.


1

https://en.wikipedia.org/wiki/Binary_search_algorithm#/media/File:Binary-search-work.gif