In the digital age, credit card fraud poses a sophisticated challenge that impacts millions worldwide, undermining financial security and trust. The burgeoning field of data science offers promising solutions, empowering us to detect and prevent fraudulent activities with unprecedented precision. Today, we embark on an analytical journey through the Credit Card Fraud Detection dataset, leveraging Python to uncover the subtle nuances of fraudulent transactions.

The Dataset: A Glimpse into the World of Transactions

The Credit Card Fraud Detection dataset, made publicly available by Kaggle, serves as a cornerstone for our exploration. Comprising transactions made by European cardholders in September 2013, it encapsulates the intricate dynamics of fraud in the digital payment ecosystem. The dataset is characterized by features derived from PCA transformation, ensuring anonymity, alongside time and transaction amount details, all culminating in a binary classification problem: fraud or no fraud.

Step 1: Data Exploration

Our initial foray into the dataset involves a thorough examination to understand its structure and nuances.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming you have the dataset as 'creditcard.csv'
df = pd.read_csv('creditcard.csv')

# Basic data exploration
print(df.head())
print(df.describe())

# Check for imbalance in the dataset
fraud_counts = df['Class'].value_counts()
print(fraud_counts)

  Time        V1        V2        V3        V4        V5        V6        V7  \\
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   
3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   
4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   

         V8        V9  ...       V21       V22       V23       V24       V25  \\
0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   
1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   
2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   
3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   
4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   

        V26       V27       V28  Amount  Class  
0 -0.189115  0.133558 -0.021053  149.62      0  
1  0.125895 -0.008983  0.014724    2.69      0  
2 -0.139097 -0.055353 -0.059752  378.66      0  
3 -0.221929  0.062723  0.061458  123.50      0  
4  0.502292  0.219422  0.215153   69.99      0  

[5 rows x 31 columns]
                Time            V1            V2            V3            V4  \\
count  284807.000000  2.848070e+05  2.848070e+05  2.848070e+05  2.848070e+05   
mean    94813.859575  1.168375e-15  3.416908e-16 -1.379537e-15  2.074095e-15   
std     47488.145955  1.958696e+00  1.651309e+00  1.516255e+00  1.415869e+00   
min         0.000000 -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00   
25%     54201.500000 -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01   
50%     84692.000000  1.810880e-02  6.548556e-02  1.798463e-01 -1.984653e-02   
75%    139320.500000  1.315642e+00  8.037239e-01  1.027196e+00  7.433413e-01   
max    172792.000000  2.454930e+00  2.205773e+01  9.382558e+00  1.687534e+01   

                 V5            V6            V7            V8            V9  \\
count  2.848070e+05  2.848070e+05  2.848070e+05  2.848070e+05  2.848070e+05   
mean   9.604066e-16  1.487313e-15 -5.556467e-16  1.213481e-16 -2.406331e-15   
std    1.380247e+00  1.332271e+00  1.237094e+00  1.194353e+00  1.098632e+00   
min   -1.137433e+02 -2.616051e+01 -4.355724e+01 -7.321672e+01 -1.343407e+01   
25%   -6.915971e-01 -7.682956e-01 -5.540759e-01 -2.086297e-01 -6.430976e-01   
50%   -5.433583e-02 -2.741871e-01  4.010308e-02  2.235804e-02 -5.142873e-02   
75%    6.119264e-01  3.985649e-01  5.704361e-01  3.273459e-01  5.971390e-01   
max    3.480167e+01  7.330163e+01  1.205895e+02  2.000721e+01  1.559499e+01   

       ...           V21           V22           V23           V24  \\
count  ...  2.848070e+05  2.848070e+05  2.848070e+05  2.848070e+05   
mean   ...  1.654067e-16 -3.568593e-16  2.578648e-16  4.473266e-15   
std    ...  7.345240e-01  7.257016e-01  6.244603e-01  6.056471e-01   
min    ... -3.483038e+01 -1.093314e+01 -4.480774e+01 -2.836627e+00   
25%    ... -2.283949e-01 -5.423504e-01 -1.618463e-01 -3.545861e-01   
50%    ... -2.945017e-02  6.781943e-03 -1.119293e-02  4.097606e-02   
75%    ...  1.863772e-01  5.285536e-01  1.476421e-01  4.395266e-01   
max    ...  2.720284e+01  1.050309e+01  2.252841e+01  4.584549e+00   

                V25           V26           V27           V28         Amount  \\
count  2.848070e+05  2.848070e+05  2.848070e+05  2.848070e+05  284807.000000   
mean   5.340915e-16  1.683437e-15 -3.660091e-16 -1.227390e-16      88.349619   
std    5.212781e-01  4.822270e-01  4.036325e-01  3.300833e-01     250.120109   
min   -1.029540e+01 -2.604551e+00 -2.256568e+01 -1.543008e+01       0.000000   
25%   -3.171451e-01 -3.269839e-01 -7.083953e-02 -5.295979e-02       5.600000   
50%    1.659350e-02 -5.213911e-02  1.342146e-03  1.124383e-02      22.000000   
75%    3.507156e-01  2.409522e-01  9.104512e-02  7.827995e-02      77.165000   
max    7.519589e+00  3.517346e+00  3.161220e+01  3.384781e+01   25691.160000   

               Class  
count  284807.000000  
mean        0.001727  
std         0.041527  
min         0.000000  
25%         0.000000  
50%         0.000000  
75%         0.000000  
max         1.000000  

[8 rows x 31 columns]
Class
0    284315
1       492
Name: count, dtype: int64

Class Distribution Insight (Balanced or unbalanced? - that is the question!)

A crucial aspect of our analysis reveals a stark imbalance between fraudulent and legitimate transactions, a common scenario in fraud detection datasets.

sns.countplot(x='Class', data=df)
plt.title('Transaction Class Distribution')
plt.yscale('log') 
plt.show()

Untitled

Delving into Transaction Amounts

Understanding the distribution of transaction amounts offers insights into spending patterns, potentially unraveling characteristics unique to fraudulent transactions.

plt.figure(figsize=(10, 6))
sns.histplot(df[df['Class'] == 0]['Amount'], bins=50, color='green', label='Legitimate')
sns.histplot(df[df['Class'] == 1]['Amount'], bins=50, color='red', label='Fraudulent')
plt.xlabel('Transaction Amount')
plt.ylabel('Frequency')
plt.title('Transaction Amount Distribution')
plt.legend()
plt.yscale('log')  # Set the y-axis to log scale
plt.show()

Untitled

Temporal Patterns: Time vs. Fraud

Investigating the timing of transactions may uncover temporal patterns indicative of fraudulent activity.

plt.figure(figsize=(12, 8))
sns.histplot(df[df['Class'] == 0]['Time'], bins=50, color='green', label='Legitimate', kde=True)
sns.histplot(df[df['Class'] == 1]['Time'], bins=50, color='red', label='Fraudulent', kde=True)
plt.xlabel('Time (in Seconds)')
plt.ylabel('Frequency')
plt.title('Transaction Time Distribution')
plt.legend()
plt.yscale('log') 
plt.show()

Untitled