Getting Started with NumPy in Python

A guide for beginners to get started with NumPy.

Getting Started with NumPy in Python
Photo by Tyler Easton on Unsplash

Python lists are one of the most used data structures, but they have certain limitations that do not make them suitable for math operations, for example, if we want to add two lists element wise the following will just append list2 to list1 .list1 = [1,2,3,4,5,6,7,8,9]
list2 = [1,2,3,4,5,6,7,8,9]
print(list1+list2)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Doing other operations like division or multiplying will throw an error.list1 = [1,2,3,4,5,6,7,8,9]
list2 = [1,2,3,4,5,6,7,8,9]
print(list1/list2)---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8528/2651367894.py in <module>
     1 list1 = [1,2,3,4,5,6,7,8,9]
     2 list2 = [1,2,3,4,5,6,7,8,9]
----> 3 print(list1/list2)

TypeError: unsupported operand type(s) for /: 'list' and 'list'

Also performing math operations between lists and integers is not supported.list1 = [1,2,3,4,5,6,7,8,9]
print(list1/2)---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17396/2557175266.py in <module>
     1 list1 = [1,2,3,4,5,6,7,8,9]
----> 2 print(list1/2)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Of course, there are ways to do math operations between lists or lists and integers but they can be complex, the following example will add the elements of list1 and list2 element-wise, it works but is rather complexlist1=[1,2,3,4,5]
list2=[1,2,3,4,5]
sum_list = []
for (item1, item2) in zip(list1, list2):
   sum_list.append(item1+item2)
print(sum_list)[2, 4, 6, 8, 10]

There is a simpler solution, NumPy. NumPy is a science library for Python that can simplify problems such as this.import numpy as np
list1 = np.array([1,2,3,4,5])
list2 = np.array([1,2,3,4,5])
print(type(list1))

The first line imported NumPy library to our script with the alias np

np.array accepts a list as an argument and returns to list1 a NumPy array object.<class 'numpy.ndarray'>

Doing math operations between NumPy arrays or NumPy arrays and numbers is very simple.print(list1+list2)
print(list1/list2)
print(list1*2)[ 2  4  6  8 10]
[1. 1. 1. 1. 1.]
[ 2  4  6  8 10]

Note that the result of an operation between NumPy arrays or NumPy arrays and numbers/list is a NumPy array.

The NumPy array elements can be integers, floats, bools, strings, objects, but having event one element that its type is not suitable for math operations makes the whole array unusable to do math operations.import numpy as np
list1 = np.array([1,2,"cat"])
print(list1+2)UFuncTypeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8600/2751950444.py in <module>
     1 import numpy as np
     2 list1 = np.array([1,2,"cat"])
----> 3 print(list1+2)

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U11'), dtype('<U11')) -> dtype('<U11')

We can get the data type of the array using dtype.list1 = np.array([1,2])
print(list1.dtype)int32

Note how the data type of the array changes if we have a single element that is not a number.import numpy as np
list1 = np.array([1,2,"cat"])
print(list1.dtype)<U11

You might want to know what U datatype is and gets more info on this topic at the NumPy documentation, you can even define the data type of the elements which can be very useful when you deal with a specific type of data like dates and or very large arrays where tunning the data type can mean lower memory consumption and greater performance of math operations.

Data type objects (dtype) — NumPy v1.21 Manual

But the important things you need to remember for simple NumPy usage are

  • NumPy arrays can have a single data type for all elements
  • mixing number and not number elements will change the data type to a non-numerical and this means that you cannot do math operations on this array

NumPy arrays subsetting

NumPy array subsetting is very much the same as list subsetting, providing the index in an array will select the element of the array, to select a range of elements in an array you need to pass the indexes of the range separated by :.import numpy as np
list1 = np.array([1,2,3,4,5])
print(list1[1])
print(list1[1:3])
2
[2 3]

Also, subsetting can be done using math operators

The generated array will be of a data type of bool, each element of the array correlates to the elements of list1 and is set to False if the element is less than 3 and True if the element is greater than 3.list1 = np.array([1,2,3,4,5])
print(list1>3)
[False False False  True  True]

Doing the same math operation inside the brackets of the array will return a new array with the elements of the array that are greater than 3.import numpy as np
list1 = np.array([1,2,3,4,5])
print(list1[list1>3])
[4,5]

2D NumPy arrays

Arrays have dimensions, we can get the shape of an array using the .shape In this example, we can see that we have 5 elements in a single row.import numpy as np
list1 = np.array([1,2,3,4,5])
print(list1.shape)
(5,)

To create 2D arrays in NumPy is quite simple, it takes the following form:

[ [], [] ] ← Array [ Row[0] [elements of row 0], Row[n] [elements of row n]]import numpy as np
list1 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(list1.shape)
(2, 5)

The output of shape indicates that we have two rows and five columns

Subsetting a 2D array is similar to 1D array, the bellow example will print the first element of the first row, and the second print statement will print the first element of the second row.import numpy as np
list1 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(list1[0][0])
print(list1[1][0])
1
6

Getting a row is very simple, the following will get the second row.import numpy as np
list1 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(list1[1])
[6,7,8,9,10]

Also getting a column is very simple, the following will get the third column.import numpy as np
list1 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(list1[:,2])
[3 8]

A subset of the array can be extracted like this.import numpy as np
list1 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(list1[:,0:2])
[[1 2]
[6 7]]

Basic statistics with NumPy

Let's talk a bit about statistics

Calculating basic statistics with plain python requires many lines of code, NumPy built-in functionality to do such calculations.

  • mean: the mean is calculated by adding all the numbers and dividing the sum with the count of numbers.np.mean(<array>)
  • median: the median is the number in the middle of a set of numbers, to calculate the median sort the numbers numerically and then divide the count by two, if the count is an odd number round the result of the division up to get the position of the median number, if the count is an even number calculate the average of the two middle number, this is the median.np.median(<array>)
  • standard deviation: the standard deviation is the measure of how dispersed is the data in relation to the mean, low standard deviation close to zero means that the numbers of the set are near the mean, high standard deviation means that the numbers of the set are not near the mean.np.std(<array>)

Additional useful NumPy functions

  • np.sort(<array>) : sorts an array
  • np.sum(<array>) : sums all the elements of an array

NumPy has excellent documentation, you can find more at Routines — NumPy v1.21 Manual

I hope you find this sort of tutorial easy and help you move on with NumPy! :)

More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.

Join Medium with my referral link - Konstantinos Patronas
As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…