# 3 Essential Pandas Methods to Handle Missing Values

Pandas provide numerous functions and methods to clean and preprocess the dataset to make it production-ready.

In this article, we’ll see the methods provided by pandas to **handle missing values** in a dataset.

# df.fillna()

The `DataFrame.fillna()`

is used to fill in the missing values with the desired value. Let's see how we can use it.

`data = {"A": [2, np.nan, 19, 34, np.nan],`

"B": [np.nan, 23, 12, 34, np.nan]}

df = pd.DataFrame(data)

df

--------------------

A B

0 2.0 NaN

1 NaN 23.0

2 19.0 12.0

3 34.0 34.0

4 NaN NaN

# Filling Arbitrary Value

`filled_df = df.fillna(0)`

print(filled_df)

--------------------

A B

0 2.0 0.0

1 0.0 23.0

2 19.0 12.0

3 34.0 34.0

4 0.0 0.0

We passed an arbitrary value (`0`

) to fill those `NaN`

values in the dataset `df`

.

# Fill Using a Dataset

We can also use a dataset to fill in the missing values.

`df2 = pd.DataFrame({"A": [1,2,3,4,5], "B": [6,7,8,9,10]})`

fill_using_df = df.fillna(df2)

print(fill_using_df)

--------------------

A B

0 2.0 6.0

1 2.0 23.0

2 19.0 12.0

3 34.0 34.0

4 5.0 10.0

When using the `fillna()`

method with `df2`

, the `NaN`

values in the original DataFrame `df`

are replaced by the corresponding values in `df2`

. If a cell in `df`

is `NaN`

, the method will look for the corresponding value in `df2`

(at the same position) and use that value to fill in the `NaN`

.

# Filling Different Values in Each Column

If we want to fill in different values in each column, we can use the following approach.

`values = {"A": 100, "B": 200}`

diff_val = df.fillna(value=values)

print(diff_val)

--------------------

A B

0 2.0 200.0

1 100.0 23.0

2 19.0 12.0

3 34.0 34.0

4 100.0 200.0

The value dictionary holds values to fill `NaN`

in columns `A`

and `B`

in the dataset. By using `df.fillna(value=values)`

, the `NaN`

value in column `A`

is filled with the value `100`

and `NaN`

value in column `B`

is filled with the value `200`

.

# df.interpolate()

The `DataFrame.interpolate()`

method provides various **interpolation techniques** to fill in the missing values.

Instead of filling in hard-coded values, we can use an interpolation method to fill missing values that make the dataset even more expressive and real.

# Filling Computed Value

`df2 = df.interpolate() # default: linear method and axis=0`

print(df2)

--------------------

A B

0 2.0 NaN

1 10.5 23.0

2 19.0 12.0

3 34.0 34.0

4 34.0 34.0

When we use `df.interpolate()`

, the default `linear`

interpolation method is used that fills the `NaN`

values equally spaced ignoring the index.

For example, in column `A`

, `10.5`

is filled which is equally spaced between the values `2.0`

and `19.0`

with the difference of `8.5`

.

But if we see the fourth row in both columns, they are filled with the same value (`34.0`

) as above them because there were no values to compute in the fifth row.

# Filling Nearest Values

`data = {"A": [3, np.nan, 2, np.nan, 4],`

"B": [1, 4, np.nan, 2, 5]}

df = pd.DataFrame(data)

df3= df.interpolate(method='nearest')

print(df3)

--------------------

A B

0 3.0 1.0

1 3.0 4.0

2 2.0 4.0

3 2.0 2.0

4 4.0 5.0

When we use `method='nearest'`

, the `NaN`

values are filled with the nearest valid values.

In this case, the second row in column `A`

is filled with the value of `3.0`

. **Why so?** The nearest value is decided based on the index close to the `NaN`

value index. The index `0`

(`3.0`

) is closest to the index `1`

. The same is applied to all the `NaN`

values.

# Filling Values Considering Index Values

`data = {"A": [3, np.nan, 9, np.nan, 4],`

"B": [1, 10, np.nan, 20, 5]}

df = pd.DataFrame(data)

df4= df.interpolate(method='values') # or method='index'

print(df4)

--------------------

A B

0 3.0 1.0

1 6.0 10.0

2 9.0 15.0

3 6.5 20.0

4 4.0 5.0

The `NaN`

values are filled equally spaced considering the values of the index surrounding the `NaN`

value index.

# df.ffill() and df.bfill()

The `DataFrame.ffill()`

method is used to fill the **last** valid value in the missing place whereas the `DataFrame.bfill()`

method is used to fill the **next** valid value.

# Forward Filling With ffill()

`data = {"A": [2, np.nan, 19, 34, np.nan],`

"B": [np.nan, 23, 12, 34, np.nan]}

df = pd.DataFrame(data)

forward_fill = df.ffill()

print(forward_fill)

--------------------

A B

0 2.0 NaN

1 2.0 23.0

2 19.0 12.0

3 34.0 34.0

4 34.0 34.0

We can see that `NaN`

values are filled with the preceding valid values, for instance, the second row of column `A`

is filled with `2.0`

which is the same value above it.

# Backward Filling With bfill()

`data = {"A": [2, np.nan, 19, 34, np.nan],`

"B": [np.nan, 23, 12, 34, np.nan]}

df = pd.DataFrame(data)

backward_fill = df.bfill()

print(backward_fill)

--------------------

A B

0 2.0 23.0

1 19.0 23.0

2 19.0 12.0

3 34.0 34.0

4 NaN NaN

In this case, the `NaN`

is filled with the next valid values, for instance, the first row of column `B`

is filled with `23.0`

which is the next value in the column.

We can also see that the fifth row of columns `A`

and `B`

remains unfilled (`NaN`

) due to the absence of the next valid value in the dataset.

🏆**Other articles you might be interested in if you liked this one**

✅Pandas df.ffill() and df.bfill() to handle missing values.

✅Merge, combine, and concatenate multiple datasets using pandas.

✅Find and delete duplicate rows from the dataset using pandas.

✅How to efficiently manage memory use when working with large datasets in pandas?

✅How to find and delete mismatched columns from datasets in pandas?

**That’s all for now**

**Keep Coding✌✌**