Python: Create Synthetic Data Using Faker

Faker is a Python library that can generate fake data in a very easy way! In this article we will create a script that will the help of the…

Python: Create Synthetic Data Using Faker
Photo by Markus Spiske on Unsplash

Faker is a Python library that can generate fake data in a very easy way! In this article we will create a script that will the help of the Faker and csv libraries we will generate a csv. This csv will contain

name, last_name, date_of_birth, gender, city, country

Lets see how we can do this!.

Install the Faker library

We can install Faker using the pip3 command!

pip3 install Faker

Creating the code

Save the following as fake_data.py

#!/usr/bin/env python3 
from faker import Faker 
import random 
import csv 
 
if __name__ == '__main__': 
 
    fake = Faker() 
    num_records = 1000 
    data = [] 
 
    for _ in range(num_records): 
         
        name      = fake.first_name() 
        last_name = fake.last_name() 
        date_of_birth = fake.date_of_birth(minimum_age = 18, 
                                           maximum_age = 80) 
        gender  = random.choice(['Male','Female']) 
        city    = fake.city() 
        country = fake.country() 
 
        data.append([name,last_name,date_of_birth,gender,city,country]) 
     
    csv_filename = "fake_data.csv" 
 
    with open(csv_filename,'w',newline='') as csvfile: 
        csv_writer = csv.writer(csvfile) 
        csv_writer.writerow(["Name","Last Name","Date Of Birth","Gender","City","Country"]) 
        csv_writer.writerows(data)

Explaining the code

To run the script enter in the terminal

python ./fake_data.py

This will generate a csv file with 1000 records, lets examine the most important parts of the code

This line imports faker

from faker import Faker

The lines inside the for statement are those that generate synthetic data based on simple rules!

name      = fake.first_name() 
last_name = fake.last_name() 
date_of_birth = fake.date_of_birth(minimum_age = 18, 
                                   maximum_age = 80) 
gender  = random.choice(['Male','Female']) 
city    = fake.city() 
country = fake.country()

Faker has a lot of built in and community providers for many common cases like first name, last name etc, for discrete random options we can use the random.choice() function and a list of options

Next with the help of csv library we create the csv file

with open(csv_filename,'w',newline='') as csvfile: 
        csv_writer = csv.writer(csvfile) 
        csv_writer.writerow(["Name","Last Name","Date Of Birth","Gender","City","Country"]) 
        csv_writer.writerows(data)

First, we create the csv header using the .writerow() function and then we pass as the data rows with the .writerows() function

Conclusion

Faker is a simple but yet powerful library that allows us to create synthetic data with ease! I hope you enjoyed this article as much I enjoyed writing the article!

In Plain English

Thank you for being a part of our community! Before you go: