Random Data Distribution

M.Ramya

 What is Data Distribution?

  • Data distribution refers to the way data values are spread out across a dataset. It shows all the possible values in the dataset and how frequently each value appears.
  • Understanding data distribution is crucial when working with statistics and data science, as it helps analyze patterns and make predictions.

Random Data Distribution

  • A random distribution is a collection of random numbers that follow a particular probability density function (PDF).
  • Probability Density Function (PDF)
  • A PDF describes the likelihood of different outcomes in a continuous random variable. It defines the probability of every possible value within a given range.
  • Generating Random Distributions with Python
  • Python’s NumPy library provides methods to generate random data distributions. One such method is random.choice(). This method allows you to:
  • Specify a list of possible values.
  • Define the probability for each value.
  • Probability Settings:
  • Each probability value must be between 0 and 1.
  • The sum of all probability values must equal 1.
  • A probability of 0 means the value will never appear, and a probability of 1 means it will always appear.

Program:

 Generating a 1-D Random Distribution

Let’s generate an array of 100 random values, where each value can be either 3, 5, 7, or 9 with defined probabilities:

Probability for 3: 0.1

Probability for 5: 0.3

Probability for 7: 0.6

Probability for 9: 0.0

from numpy  import random

x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=100)

print(x)

Output:

[7 7 7 5 7 5 7 5 7 7 7 7 5 5 7 7 5 7 5 7 7 5 7 7 7 7 5 7 3 7 7 7 7 7 5 7 7 5 7 7 7 7 5 5 7 5 7 5 7 7 7 7 7 7 7 5 7 7 7 5 7 7 7 7 7 7 7 7 7 7 5 7 5 7 7 7 7 5 5 5 5 5 7 7 7 7 5 7 7 7 7 5 7 7 7 7 7 7 7 5 5 7 7 7]

Note: In this case, 9 will never appear since its probability is 0.

Program: 

Generating a 2-D Random Distribution

We can also generate multi-dimensional arrays by specifying the desired shape using the size parameter. Here’s an example of a 2-D array with 3 rows and 5 columns using the same probabilities:

from numpy import random

x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(3, 5))

print(x)

Output:

[[7 7 5 7 7]
 [7 7 7 5 7]
 [7 5 7 7 7]]
Tags
Our website uses cookies to enhance your experience. Learn More
Accept !

GocourseAI

close
send