In pandas, you can give a specific sorting order to categorical values by creating a categorical variable with an ordered category. Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import pandas as pd # Sample DataFrame data = { 'day' : [ 'Monday' , 'Wednesday' , 'Friday' , 'Tuesday' , 'Thursday' ]} df = pd.DataFrame(data) # Define the custom order for sorting custom_order = [ 'Monday' , 'Tuesday' , 'Wednesday' , 'Thursday' , 'Friday' ] # Convert 'day' column to a categorical variable with custom order df[ 'day' ] = pd.Categorical(df[ 'day' ], categories = custom_order, ordered = True ) # Sort the DataFrame based on the custom order df = df.sort_values( 'day' ) # Display the sorted DataFrame print (df) |
day
0 Monday
3 Tuesday
1 Wednesday
4 Thursday
2 Friday
In this example:
- We create a DataFrame with a ‘day’ column containing days of the week in a random order.
- We define a custom order for sorting (‘custom_order’).
- We convert the ‘day’ column to a categorical variable using
pd.Categorical
with the specified custom order and settingordered=True
. - We use
df.sort_values('day')
to sort the DataFrame based on the custom order. - The resulting DataFrame will have rows sorted according to the custom order of the ‘day’ column.
This can be useful when you want to ensure that certain operations, such as sorting or plotting, take into account the natural order of the days of the week.
Drawing a graph before adding sort order to the week_day categorical value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # Assume pickup_dt is object . We can change the data type of pickup_dt to date-time format. df[ 'pickup_dt' ] = pd.to_datetime(df[ 'pickup_dt' ], format = "%d-%m-%Y %H:%M" ) # Now we can extract date parts from pickup date df[ 'start_year' ] = df.pickup_dt.dt.year # extracting the year from the date df[ 'start_month' ] = df.pickup_dt.dt.month_name() # extracting the month name from the date df[ 'start_hour' ] = df.pickup_dt.dt.hour # extracting the hour from the time df[ 'start_day' ] = df.pickup_dt.dt.day # extracting the day from the date df[ 'week_day' ] = df.pickup_dt.dt.day_name() # extracting the day of the week from the date # let's draw a lineplot for week_day and pickups plt.figure(figsize = ( 15 , 7 )) sns.lineplot(data = df, x = "week_day" , y = "pickups" , ci = False , color = "red" , estimator = 'sum' ) plt.ylabel( 'Total pickups' ) plt.xlabel( 'Weeks' ) plt.show() |

After adding sort order to the week_day categorical value:
1 2 3 4 5 6 7 8 | cats = [ 'Monday' , 'Tuesday' , 'Wednesday' , 'Thursday' , 'Friday' , 'Saturday' , 'Sunday' ] df.week_day = pd.Categorical(df.week_day, ordered = True , categories = cats) plt.figure(figsize = ( 15 , 7 )) sns.lineplot(data = df, x = "week_day" , y = "pickups" , ci = False , color = "red" , estimator = 'sum' ) plt.ylabel( 'Total pickups' ) plt.xlabel( 'Weeks' ) plt.show() |
