How to Delete or Replace a Column/Row from a DataFrame Using Pandas?

How to delete or replace a column/row in a dataframe

Trying to figure out how to delete or replace a column or row in a DataFrame using Pandas with Python? We’ve got you covered!

Changing large chunks of data into something meaningful might sound easy at first glance. However, even with the support of the mighty Pandas library — it can be a huge hassle. Pair that with row and column manipulation, and everything becomes a massive headache. 😵 

You try to delete or replace a column or row using the generic conventions — but they just don’t seem to budge. 

Since programming is about problem-solving, your concurrent methods are likely to blame. 😕

Delete or replace a column/row

 

According to the Pandas library, there is no dedicated function to alter or delete the values of a row or column. However, the library itself is pretty much jam-packed with accessors that can be used to target such rows and columns with labels and indexes.

Hence, it’s recommended you memorize those accessors instead of trying to stick to a stock-standard method of row/column manipulation. These accessors generally work as follows:

  • Take a label/index or a list of labels/indexes as arguments for locating the address within the DataFrame. 
  • Applying a specified sub-function to the calculated address specified by the programmer.

Having said that, the main goal of this article is to show a variety of ways that enable you to delete and change rows and columns in a Pandas DataFrame using the Pandas library for the Python programming language. 😏

 

Note: This guide will primarily focus on the basic accessor techniques that allow you to locate the bits of information needed to be changed or deleted. Therefore, you don’t have to go out of your way to search for a specific be-all-end-all function in the Pandas library. 

 

 

How to Delete a Column/Row from a Pandas DataFrame?

1. Understanding the Format

A row or a column can be deleted from a DataFrame with the aid of the following function:

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

The drop function can take in seven arguments that dictate how the values will be resized. Here’s a look at what each parameter does on an individual level:

  1. labels: Takes in a list-like value or a set of values that indicate which column or row is supposed to be treated. However, a tuple value will only be treated as a singularity. 
  2. axis: A 0,1 value that determines whether a column will be selected or a row. The 0 indicates a row, while the 1 shows a column.
  3. index: Takes in a single variable or a list-like and acts like an index alternative to the axis parameter. 
  4. columns: Takes in a single variable or a list-like and acts like a column alternative to the axis parameter.
  5. level: Takes in an integer or a level name and identifies the point from which the labels are supposed to be removed. 
  6. inplace: A boolean value that determines whether to return a copy of the list after a drop or to make adjustments to the original DataFrame. 
  7. errors: This parameter primarily ignores any errors raised during the function execution if ‘ignore’ is passed.

 

This function will or will not return a value depending on the parameters. Therefore, you can expect to get a copy of the DataFrame returned by default — unless specified otherwise. Just be sure to store the value in a variable in case you want to return it 😎:

Variable_name = DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

 

 

2. Making It Work

With that said, here’s a practical example of how the drop function is expected to operate on a sample Pandas DataFrame:

  1. To start, let’s create a sample Python DataFrame of a class to play around with:
import pandas as pd

DataFrame = pd.DataFrame(

   {

       'Name': ['Duran', 'Rajesh', 'Jinsoo', 'Mary', 'Joseph', 'Daniel'],

       'Class': ['A', 'A', 'B', 'B', 'C', 'C'],

       'Subjects': ['Mathematics', 'Chemistry', 'History', 'Geography', 'English', 'Gym'],

       'Grades': [95, 73, 60, 93, 80, 65],

       'Fail': ['No', 'No', 'No', 'No', 'No', 'No']

   }

 

  1. Next, call the drop function with the following parameters to drop the Fail column and store it in a variable to display later:
variable = DataFrame.drop("Fail", axis=1, inplace=False)

 

  1. Upon execution, the variable will have the following DataFrame stored inside:
Name Class     Subjects  Grades

0   Duran     A  Mathematics      95

1  Rajesh     A     Chemistry      73

2  Jinsoo     B      History      60

3    Mary     B    Geography      93

4  Joseph     C      English      80

5  Daniel     C          Gym      65

 

 

How to Replace a Row/Column in a Pandas DataFrame?

Method 1: Using the at() Function

The following function lets you access a singular value from a DataFrame:

DataFrame.at[index, column]

 

This method only works if you’re already aware of the index (row) or column values of the data you’re about to replace. 🧐

 

Basic Working

Here’s how you can replace a specific value on a sample DataFrame:

  1. To start, import the dependencies and create a sample class DataFrame:
import pandas as pd 

DataFrame = pd.DataFrame(

   {

       'Name': ['Duran', 'Rajesh', 'Jinsoo', 'Mary', 'Joseph', 'Daniel'],

       'Class': ['A', 'A', 'B', 'B', 'C', 'C'],

       'Subjects': ['Mathematics', 'Chemistry', 'History', 'Geography', 'English', 'Gym'],

       'Grades': [95, 73, 60, 93, 80, 65],

       'Fail': ['No', 'No', 'No', 'No', 'No', 'No']

   }

) 

 

  1. Next, call the at() function with the index and labels of your choice 🤓, and assign a new value to it:
DataFrame.at[5, 'Class'] = 'D' 

 

  1. Upon executing, the new DataFrame values will be as follows:
    Name Class     Subjects  Grades Fail

0   Duran     A  Mathematics      95   No

1  Rajesh     A     Chemistry      73   No

2  Jinsoo     B      History      60   No

3    Mary     B    Geography      93   No

4  Joseph     C      English      80   No

5  Daniel     D          Gym      65   No

 

Column Customization

Keeping the previous code intact, here’s what you need to do to make a workable column function utilizing at():

  1. Let’s start by defining a custom function that takes in the DataFrame object, the label, and a list of new column values as parameters:
def column_replace(DataFrame, label, new_col):

 

  1. Now, grab the total length of the column, and store it in a variable for later use:
cycles = len(DataFrame[label])

 

  1. From here, create a loop that’s supposed to be run for the length of the column, and perform the at() function for every instance for the corresponding value in the new_col:
for x in range(cycles):

   DataFrame.at[x, label] = new_col[x]

 

  1. Lastly, create a return statement to get the new object after the replacements are over:
return DataFrame

 

Row Customization

Keeping the previous code intact 🤖, here’s what you need to do to make a workable row function utilizing at():

  1. Let’s start by defining a custom function that takes in the DataFrame object, the label, and a list of new column values as parameters:
def row_replace(DataFrame, index, new_row):

 

  1. Now, grab all the column values, and store them in a variable for later calculations:
column_names = DataFrame.columns.values

 

  1. From here, create a loop that runs across all these column names and use the at() function to interchange all the values with each iteration.
for x in range(len(column_names)):
   DataFrame.at[index, column_names[x]] = new_row[x]

 

  1. Lastly, create a return statement to get the new object after the replacements are over:
return DataFrame

 

Testing

Since you’ve created two functions, the output will vary depending on your call statement. Plus, there’s no telling what kind of logical errors creep their way in while you’re trying to organize all the components.

Here is how you can test the column_replace function:

  1. For starters, create a sample column out of a sample DataFrame:
import pandas as pd


DataFrame = pd.DataFrame(

   {

       'Name': ['Duran', 'Rajesh', 'Jinsoo', 'Mary', 'Joseph', 'Daniel'],

       'Class': ['A', 'A', 'B', 'B', 'C', 'C'],

       'Subjects': ['Mathematics', 'Chemistry', 'History', 'Geography', 'English', 'Gym'],

       'Grades': [95, 73, 60, 93, 80, 65],

       'Fail': ['No', 'No', 'No', 'No', 'No', 'No']

   }

)

new_row = ['B', 'C', 'F', 'G', 'I', 'H']

 

  1. Secondly, initiate the function call, and print the outputs:
DataFrame = column_replace(DataFrame, 'Class', new_row)

print(DataFrame)

 

  1. Lastly, compare your output with the following to see if they match 😌:
  Name Class     Subjects  Grades Fail

0   Duran     B  Mathematics      95   No

1  Rajesh     C     Chemistry      73   No

2  Jinsoo     F      History      60   No

3    Mary     G    Geography      93   No

4  Joseph     I      English      80   No

5  Daniel     H          Gym      65   No

 

Keeping the new DataFrame values in mind, here’s how you can test the row_replace function:

  1. For the same DataFrame, create a dummy row, and assign the values to the list:
new_classes = ['Himesh', 'Z', 'Archeology', 89, 'No']

 

  1. Now, call the function, and pass the DataFrame, the index, and the new_classes list as parameters:
DataFrame = row_replace(DataFrame, 5, new_classes)

 

  1. Print the new DataFrame value and match it with the following output:
 Name Class     Subjects  Grades Fail

0   Duran     B  Mathematics      95   No

1  Rajesh     C    Chemistry      73   No

2  Jinsoo     F      History      60   No

3    Mary     G    Geography      93   No

4  Joseph     I      English      80   No

5  Himesh     Z   Archeology      89   No

 

Method 2: Using loc Function

Despite being shorter than the main method, this function only works when a value’s complete row and column are known. While this tends to work as a double-edged sword in most cases 😥, it’s still flexible enough to be substituted in most scenarios. 

With that said, here’s the general syntax for the loc accessor for replacing values:

DataFrame.loc[row, [column_name]] = new_value

 

This function returns nothing; it makes direct edits to the primary object 😣. Therefore, be sure to copy an image of the previous DataFrame before making the call in case you need it in the future. 

 

Testing

Here’s how you can use the loc function to change a single or a set of rows/columns in a Pandas DataFrame:

  1. To begin, create a sample DataFrame, and populate it with random values:
DataFrame = pd.DataFrame(

   {

       'Type': ['Turbo', 'Racecar', 'Cruise', 'Turbo', 'Manual'],

       'Car Name': ['Audi A3', 'Ferrari', 'Lamborghini', 'GT370s', 'Hyundai Santro'],

       'Distance Travelled': [34000, 6000, 0, 3400, 28000],

       'Condition': ['7/10', '8.5/10', '10/10', '9/10', '7.5/10'],

       'Sold': ['No', 'No', 'yes', 'No', 'Yes']

   }

)

 

  1. Next, write the following loc statement to interchange all rows from index 0 to 2:
DataFrame.loc[0:2, ['Type', 'Car Name', 'Distance Travelled', 'Condition', 'Sold']] = ['Turbo', 'Audi V4', 6900, '10/10', 'Yes']

 

  1. Lastly, print the statement, and compare the output with the following:
  Type        Car Name  Distance Travelled Condition Sold

0   Turbo         Audi V4                6900     10/10  Yes

1   Turbo         Audi V4                6900     10/10  Yes

2   Turbo         Audi V4                6900     10/10  Yes

3   Turbo          GT370s                3400      9/10   No

4  Manual  Hyundai Santro               28000    7.5/10  Yes

 

 

Conclusion

In essence, no specific built-in function is designed to cater to replacing and swapping rows and columns. However, the Pandas library possesses a boatload of accessors that can help you create your custom functions. 😤

With that in mind, we’ve kept the custom function as basic as possible, so feel free to add your flavors as needed. 

Before running your algorithm, change the listicle input to your desired format. As for the delete function, just ensure that the specified indexes exist. 🤓

If the mentioned examples don’t work out on your end, hunt down any logical or version-based issues before going for a rerun. 

Lastly, let us know in the comments:

  • Were you able to delete or replace a column or row in your specific DataFrame?
  • Have you found a more straightforward solution for replacing rows?
  • Which of your projects requires you to trim down DataFrames?
  • Are there any points you believe we should’ve mentioned?

Feel free to circulate this with your peers so they don’t have to struggle with deleting or replacing rows/columns in Pandas.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts
Total
0
Share