Quantcast
Channel: Active questions tagged header - Stack Overflow
Viewing all articles
Browse latest Browse all 705

Multi-Level Header in Pandas DataFrame

$
0
0

I have the following table in a csv file:

wi_document_idwir_rejected_bywir_reasonwir_sys_created_on
Int0002277Agent_1Time out3/8/2024 11:18:10 AM
Int0002278Agent_1Time out2/26/2024 12:18:16 AM
Int0002279Agent_2Busy3/11/2024 09:18:31 AM
Int0002280Agent_2Time out3/18/2024 10:45:08 AM
Int0002281Agent_2Time out3/4/2024 10:18:22 AM
Int0002282Agent_3Time out3/18/2024 11:20:51 AM
Int0002283Agent_3Busy2/29/2024 08:13:04 AM
Int0002284Agent_4Time out3/4/2024 09:30:45 AM
Int0002285Agent_4Busy3/12/2024 10:18:34 AM

And I have the below script to calculate:

  1. The 'rejection count' by each agent on weekly basis.
  2. The 'rejection count' with reason = 'Time out' each agent on weekly basis.
  3. The 'rejection count' with reason = 'Busy' each agent on weekly basis.

Script:

import pandas as pd# Load the CSV file into a DataFramedf = pd.read_csv('Rejection Report.csv')# Convert 'wir_sys_created_on' column to datetimedf['wir_sys_created_on'] = pd.to_datetime(df['wir_sys_created_on'])# Extract week numbers from the datetime column starting from 1 and format with ISO week number and the date of the Mondaydf['week_number'] = df['wir_sys_created_on'] - pd.to_timedelta(df['wir_sys_created_on'].dt.dayofweek, unit='d')df['week_number'] = 'Week '+ df['week_number'].dt.strftime('%V') +' ('+ df['week_number'].dt.strftime('%Y-%m-%d') +')'# Group by agent, week number, and rejection reasongrouped = df.groupby(['wir_rejected_by', 'week_number', 'wir_reason'])# Calculate rejection count by reason per weekrejection_by_reason = grouped.size().unstack(fill_value=0)# Calculate total rejection count per weekweekly_rejection_count = df.groupby(['wir_rejected_by', 'week_number']).size().unstack(fill_value=0)# Filter rejection counts based on reasons 'Time out' and 'Busy'rejection_timeout = rejection_by_reason['Time out'].unstack(fill_value=0)rejection_busy = rejection_by_reason['Busy'].unstack(fill_value=0)# Concatenate DataFrames with a multi-level column indexdf_with_multiindex = pd.concat(    [weekly_rejection_count, rejection_timeout, rejection_busy],    axis=1,    keys=['Total Rejections', 'Rejections due to Time out', 'Rejections due to Busy'],    names=['', ''])# Ensure weeks are ordered chronologicallydf_with_multiindex = df_with_multiindex.reindex(sorted(df_with_multiindex.columns), axis=1)# Apply some formattingstyled_df = df_with_multiindex.style.format("{:.0f}")styled_df = styled_df.set_table_styles([    {'selector': 'th', 'props': [('text-align', 'center')]},    {'selector': 'td', 'props': [('text-align', 'center')]},    {'selector': 'caption', 'props': [('caption-side', 'bottom')]}])# Set the captionstyled_df = styled_df.set_caption('Rejections Report')# Display the styled DataFramestyled_df.set_properties(**{'border-collapse': 'collapse', 'border': '1px solid black'})

The calculation part is good, but the multiple level column headers are set incorrectly:

enter image description here

The rejection reasons and total rejection headers are on top of the week numbers which resulted in the week numbers being repeated.

I need the table headers to look like this and have columns and cells borders:

enter image description here

The week numbers should be on the top level header and nested below it the calculated columns without having the week numbers repeated for each calculated column.

Any tips on how to accomplish the desired structure?


Viewing all articles
Browse latest Browse all 705

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>