In the previous chapter, we learned about the basics of AI and Python programming. Now, let's explore some powerful Python libraries that are essential for data manipulation and analysis in AI projects.
NumPy is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays efficiently. NumPy forms the foundation for many other Python libraries used in data science and machine learning.
Installing NumPy: To install NumPy, open a terminal or command prompt and run the following command:
pip install numpy
Creating NumPy Arrays: NumPy arrays are the core data structure in NumPy. They are similar to Python lists but offer more efficient storage and computation, especially for large datasets.
import numpy as np
# Create a 1-dimensional array from a list
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1)
# Create a 2-dimensional array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)
# Create an array with a specific data type
arr3 = np.array([1, 2, 3], dtype=np.float64)
print(arr3)
# Create an array filled with zeros
zeros_arr = np.zeros((3, 4))
print(zeros_arr)
# Create an array filled with ones
ones_arr = np.ones((2, 3))
print(ones_arr)
# Create an array with a range of values
range_arr = np.arange(0, 10, 2)
print(range_arr)
# Create an array with evenly spaced values
linspace_arr = np.linspace(0, 1, 5)
print(linspace_arr)
Output:
[1 2 3 4 5]
[[1 2 3]
[4 5 6]]
[1. 2. 3.]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]]
[0 2 4 6 8]
[0. 0.25 0.5 0.75 1. ]
Array Operations and Broadcasting: NumPy provides a wide range of operations that can be performed on arrays. These operations can be mathematical, statistical, or logical in nature. NumPy also supports broadcasting, which allows arrays with different shapes to be used in arithmetic operations.
import numpy as np
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
# Element-wise addition
print(arr1 + arr2)
# Element-wise subtraction
print(arr2 - arr1)
# Element-wise multiplication
print(arr1 * arr2)
# Element-wise division
print(arr2 / arr1)
# Matrix multiplication
print(np.dot(arr1, arr2.T))
# Broadcasting example
print(arr1 + 5)
# Statistical operations
print(np.mean(arr1))
print(np.median(arr1))
print(np.std(arr1))
Output:
[[ 8 10 12]
[14 16 18]]
[[6 6 6]
[6 6 6]]
[[ 7 16 27]
[40 55 72]]
[[7. 4. 3. ]
[2.5 2.2 2. ]]
[[ 50 68]
[122 167]]
[[ 6 7 8]
[ 9 10 11]]
3.5
3.5
1.707825127659933
Array Indexing and Slicing: NumPy arrays support indexing and slicing operations similar to Python lists. You can access individual elements, rows, columns, or subsets of an array using indices and slices.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Accessing elements
print(arr[0, 0]) # First element
print(arr[1, 2]) # Element at row 1, column 2
# Slicing arrays
print(arr[:2, :2]) # First two rows and columns
print(arr[1:, :]) # From the second row to the end
print(arr[:, 1]) # Second column
# Conditional indexing
print(arr[arr > 5]) # Elements greater than 5
Output:
1
6
[[1 2]
[4 5]]
[[4 5 6]
[7 8 9]]
[2 5 8]
[6 7 8 9]
Pandas is a powerful library for data manipulation and analysis in Python. It provides data structures like Series and DataFrame that allow you to work with structured data efficiently. Pandas is built on top of NumPy and integrates well with other libraries in the data science ecosystem.