I have the following table in a csv file:
wi_document_id | wir_rejected_by | wir_reason | wir_sys_created_on |
---|---|---|---|
Int0002277 | Agent_1 | Time out | 3/8/2024 11:18:10 AM |
Int0002278 | Agent_1 | Time out | 2/26/2024 12:18:16 AM |
Int0002279 | Agent_2 | Busy | 3/11/2024 09:18:31 AM |
Int0002280 | Agent_2 | Time out | 3/18/2024 10:45:08 AM |
Int0002281 | Agent_2 | Time out | 3/4/2024 10:18:22 AM |
Int0002282 | Agent_3 | Time out | 3/18/2024 11:20:51 AM |
Int0002283 | Agent_3 | Busy | 2/29/2024 08:13:04 AM |
Int0002284 | Agent_4 | Time out | 3/4/2024 09:30:45 AM |
Int0002285 | Agent_4 | Busy | 3/12/2024 10:18:34 AM |
And I have the below script to calculate:
- The 'rejection count' by each agent on weekly basis.
- The 'rejection count' with reason = 'Time out' each agent on weekly basis.
- The 'rejection count' with reason = 'Busy' each agent on weekly basis.
Script:
import pandas as pd# Load the CSV file into a DataFramedf = pd.read_csv('Rejection Report.csv')# Convert 'wir_sys_created_on' column to datetimedf['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Mondaydf['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')df['week_number'] = 'Week '+ df['week_number'].dt.strftime('%V') +' ('+ df['week_number'].dt.strftime('%Y-%m-%d') +')'# Group by agent, week number, and rejection reasongrouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])# Calculate rejection count by reason per weekrejection_by_reason = grouped.size().unstack(fill_value=0)# Calculate total rejection count per weekweekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)# Filter rejection counts based on reasons 'Time out' and 'Busy'rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)# Concatenate DataFrames with a multi-level column indexdf_with_multiindex = pd.concat( [weekly_rejection_count, rejection_timeout, rejection_busy], axis=1, keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'], names=['', ''])# Ensure weeks are ordered chronologicallydf_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)# Apply some formattingstyled_df = df_with_multiindex.style.format("{:.0f}")styled_df = styled_df.set_table_styles([ {'selector': 'th', 'props': [('text-align', 'center')]}, {'selector': 'td', 'props': [('text-align', 'center')]}, {'selector': 'caption', 'props': [('caption-side', 'bottom')]}])# Set the captionstyled_df = styled_df.set_caption('Rejections Report')# Display the styled DataFramestyled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})
The calculation part is good, but the multiple level column headers are set incorrectly:
The rejection reasons and total rejection headers are on top of the week numbers which resulted in the week numbers being repeated.
I need the table headers to look like this and have columns and cells borders:
The week numbers should be on the top level header and nested below it the calculated columns without having the week numbers repeated for each calculated column.
Any tips on how to accomplish the desired structure?