Report: Optimized Buffer Automation - NA AR Sortable
Setting buffer ranges directly impacts the stability of outbound (OB) operations. Due to performance variations over time, monitoring and adjusting buffers is crucial for consistent performance. Maintaining healthy buffers is a maintenance of performance, it is not a step increase of performance itself.
Historically, buffer ranges were established via site-specific analysis using nonstandard key performance indicators (KPIs) through a trouble ticket process. In 2022, ARC introduced Buffer Analysis Engine (BAE) to set buffers network-wide, using standardized KPIs and a 30-day update cycle based on recirculation, idle time, and minutes of work (MoW) metrics.
Currently, the buffer configuration approach (BAEv2) is fixed on using MoW to prevent idle time and MRS recirculation. BAEv2 has no upper limit, leading to buffer range increases if MRS recirculation is not maintained below 20%, negatively impacting OB cycle time. To solve this, cycle time was introduced as an indicator for a balanced buffer setup, optimizing speed, idle time, and rates.
BAEv2 tends to prioritize higher ranges due to its algorithm utilizing MRS recirculation, causing longer item dwell times and cycle times. Q1-2 2023 tests by a pizza team (ACES/ARC/AFT) at BOI2, DFW7 and LIT1 compared cycle-time optimized buffer ranges vs. MRS recirculation showing a 12.51% speed decrease in pick-to-MRS divert and 9.78% in MRS divert to induct, leading to a 43-minute (15.95%) reduction in FT SLA[1]. Using initial test findings, a series of pilots[2] were conducted to integrate cycle time into the buffer range setting calculations leading to a final pilot to confirm the baseline algorithm for the buffer automation[3] workstream and near-term scalability. The pilot was designed for an intervention analysis.
During the final 14-day pilot, from June 20, 2023, to July 3, 2023, sites attained a 9% improvement in cycle times and a 0.4% increase in rates. These improvements translate to an annualized benefit of $10.9MM for cycle time improvements through driving reduced FT SLA and ultimately increased glance views. Additionally, OB productivity improvement through rate enhancements can capture $12.0MM.
Network deployment of BAEv3 – Speed Optimized Buffers effective 8/16/2023
Reversion Criteria:
· Breaching Max Buffer increases cycle time (CT) and recirculation
· Breaching Min Buffer decreases rates and increases Out of Work Instances (OOW)
· A sustained 10% decrease in rates for 3 hours, or 5% over 24 hours
· A sustained 10% increase in CT for 3 hours or 5% over 23 hours
· A sustained 5% increase in time OOW for 8 hours
· A sustained 3% increase in recirc for 8 hours
The pilot graduation criteria included:
· Improved OB cycle time
· MRS Recirc maintained below (20%)
· Do no harm to OOW
· Do no harm OB throughput
Project Details
To align on a singular buffer model for NACF AR buffer automation and near-term scalability, two models were tested: ARC’s Buffer Analysis Engine v3 (BAEv3) and AFT Data Science Model (DSM):
BAEv3 - utilizes existing BAEv2 framework with the following logic changes;
· Compares OOW vs recirculation and cycle time to recommend increase or decrease
· Checks recommended ranges against guardrails[4] to prevent over-increasing or decreasing as seen in current production.
o avg((min_mow*minutecapacity)(1+p70_pct_ppad)) as min_guardrail
o avg((max_mow*minutecapacity)(1+p70_pct_ppad)) as max_guardrail
· These changes allow buffers to be hedged in accurate directions without egregious changes at a site level
DSM - Aligns buffer ranges towards minimum target units at destination by process path
· Minimum buffer is the sum of three buffers: In-picking, in-transit, and at-destination.
· Minimum buffer formula: (UPT-1) * rate/(pick_rate * 2 * pick_share) + ct25 * rate/60 + Minimum units at destination
· DSM model performs data retrieval, cleaning, and transformation to obtain in-picking and in-transit buffers.
· Minimum units at destination buffer is the at-destination buffer size above which one cannot observe a significant improvement of associate throughput.
· Process to determine minimum units at destination:
o Pull historical data (6 months horizon) on pack/induct throughput, at-destination buffer size, and detected headcount.
o Remove outlier observations from data and calculate at-destination buffer and throughput per associate.
o Bin at-destination units such that a statistical comparison between observed throughput per bin is possible (a bin must have at least 20 observations to prevent the kurtosis test from failing).
o Conduct a statistical comparison of throughput between adjacent at-destination bins.
§ Find the largest at-destination bin for which the comparison yields a significant, positive difference between throughputs.
§ Take this bin as an initial solution for the minimum buffer size.
§ Check n immediately larger bins if throughput can be improved more by a pre-selected threshold. (This is an additional heuristic to ensure robustness of the solution.)
§ If throughput under initial solution cannot be improved, return initial solution. Otherwise, return the solution of the added heuristic.
· Current maximum buffer limit formula: minimum * 130% (multiplication by 130% was recommended by ACES)
· Future maximum buffer limit formula: (UPT-1) * rate/(pick_rate * 2 * pick_share) + ct25 * rate/60 + upt * Max data science tote target
a. PILOT
Models were tested at the following sites: AGS1, OKC1, PSP1, YHM1, ABQ1, GRR1, ORF3, STL8. Sites selection was determined using a data science cluster analysis that compared current site performance, principal component analysis and generation type and additionally were not incorporated in another flow pilot. Each site spent separate weeks on BAEv3 vs DSM to gain variability, intentional intervention change, through the pilot. While decision framework incorporated primary and secondary effects by fractional volume of the process path by site. Initial findings displayed BAEv3 had a more positive impact on CT average (+900bps), while DSM had a more positive impact on rates (+8bps) vs BAEv3 (+4bps).
In reviewing additional KPIs, chuting rate performance on the DSM model displayed a 7bps improvement towards rate while induct remained flat[5]. By performing autoregressive integrated moving average (ARIMA[6]) analysis it was determined that despite DSM achieving better CT99 improvements it preformed 230bps worse in CT99.75 vs BAEv3. BAEv3 resulted in predominantly flat rate improvement across both chuting and induct[7]. Median Time in Buffer (TIB) improved for both DSM (+37bps) and BAEv3 (+34.5bps) vs control. Distribution of TIB under BAE remained flatter than under DSM, however TIB improvements between models is not statistically significant.
b. STATISTICAL ANALYSIS
Statistical Significance of BAEv3 average cycle time[8]:
During the post-intervention period, the response variable had an average value of approx. 61.47. By contrast, in the absence of an intervention, we would have expected an average response of 65.46. The 95% interval of this counterfactual prediction is [63.47, 67.32]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -3.99 with a 95% interval of [-5.85, -2.00]. For a discussion of the significance of this effect, see below:
· Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 491.77. By contrast, had the intervention not taken place, we would have expected a sum of 523.67. The 95% interval of this prediction is [507.75, 538.57].
· The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -6%. The 95% interval of this percentage is [-9%, -3%].
· This means that the negative (reduction) effect observed during the intervention period is statistically significant. If the experimenter had expected a positive effect, it is recommended to double-check whether anomalies in the control variables may have caused an overly optimistic expectation of what should have happened in the response variable in the absence of the intervention.
The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0). This means the causal effect can be considered statistically significant.
Statistical Significance of BAEv3 average outbound rate[9]:
During the post-intervention period, the response variable had an average value of approx. 530.49. In the absence of an intervention, we would have expected an average response of 531.88. The 95% interval of this counterfactual prediction is [524.94, 538.82]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -1.39 with a 95% interval of [-8.32, 5.56]. For a discussion of the significance of this effect, see below:
· Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 4.24K. Had the intervention not taken place, we would have expected a sum of 4.26K. The 95% interval of this prediction is [4.20K, 4.31K].
· The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -0%. The 95% interval of this percentage is [-2%, +1%].
· This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
The probability of obtaining this effect by chance is p = 0.349. This means the effect may be spurious and would generally not be considered statistically significant.
c. ROLLOUT EVALUATION
As a result of the marginal difference between both models, deployment timeline must also be considered. The BAEv3 model can quickly deploy due to its inherent design. Whereas, integrating DSM through buffer automation requires an extended deployment period. Our proposal is for the AR network to transition to BAEv3 by 8/16, capitalizing on observed pilot benefits. Existing BAE upgrade cadence of 30-day performance period and accompanying updates will remain. Simultaneously, AFT is enhancing automated buffers within flow tools following the North Star vision:
· Short Term (<6mo)
ARC team will continue to use the existing dashboard (BAE) to monitor buffer performance as they are today. ARC team will transition to the DSM QuickSight dashboard to copy the automated buffer values and enter them manually in Outbound Flow Tracker. (ECD: Q3 2023)
· Medium Term (<18mo)
Step 1 will consist of adding an "Updated Buffers" tab with buffer information will be added to the Outbound Flow Tracker or AutoFlow (TBD), removing the need for a QuickSight dashboard. Then incorporate a button, "Save Buffer Size Limits," where the newest buffer settings will be applied, removing the need to manually enter in each buffer per process path per FC.
· Long Term (<36mo)
System automatically updates buffers for all FCs while also allowing users to manually override if necessary.
2. FINANCIAL ASSUMPTIONS & OVERALL BENEFITS
A: Financial
B: Non-Financial
· NA deployment target is August 14th 2023
Table 1: Deployment Timelines
Project Start Date:
03-2023
Initial hypothesis testing (LIT1)
Testing
04 through 07-23
BOI2 Kaizen (Phase 1) – 4/1-4/15 2023
Phase 2 Buffer Pilot – 5/15-6/10 2023
Phase 3 Buffer Pilot – 6/20-7/3 2023
(Additional testing at BAPv2 sites: 05/01-07/16)
Project Completion Date:
8-16-2023
NA Deploys Wednesday, August 16th 2023
Benefit Realization Date:
9-16-2023
Financial impact / Savings beginning 1 month from launch
4. DISPOSAL/TRANSFER OF EXISTING ASSETS
N/A
1. Vendor Labor? ☐Yes ☒No
2. Internal Labor? ☒Yes ☐No
3. If Internal Amazon Labor is being considered/required to execute this change, please indicate from which team (s) (RME, Ops Engineering, ACES, Inbound Associates, etc.): ACES and AFT
4. Total Labor Hours: N/A
5. If Vendor Labor is being considered, which team is responsible for overseeing the vendor while in the Amazon building? N/A
6. Is downtime required for work? ☐Yes ☒No
7. If downtime is required, please provide the total hours: N/A
8. If global project, please indicate who will be responsible for executing the project for each region: NA: Aimee Coors, EU: Nick Khuri
The outcome of not rolling out will be a loss in Time in Buffer[10], optimizes logic deployment: BAPv2, preventing MRS recirculation, productivity and cycle time improvement.
Risks: Automated Buffer Management is limited by its input accuracy; data accuracy and rate accuracy.
· Callout: Previous tested models were sensitive to noise and outliers within the data
Issue: The calculation of in-picking and in-transit buffers for PSP1 was affected, resulting in excessively high buffer limits. In turn, leading to increased recirculation rates and worse cycle times. Although other sites were not similarly affected, this issue highlighted the need for data cleaning and pre-processing to mitigate such distortions.
What this causes: Outlier data points within site data set
How long has this problem existed: During testing of DSM model
How to manage it in current state: To address this concern, we extended the DSM model to include data cleaning and pre-processing steps before calculating the in-picking and at-transit components of the total buffer limits. Lessons learned were applied to BAEv3.
Callout: Reliance on historical rate as inputs to make predictions
Issue: we discovered that rate inaccuracies emerged as the second most significant factor contributing to out-of-work (OOW) instances, following overstaffing.
What this causes: Current rate forecasting tools present a margin of error within forecasted values
· How to manage it in current state: To improve the accuracy and reliability of the models, it is essential to enhance rate forecasting or consider hedging strategies to account for rate uncertainties. One approach worth exploring involves adjusting lower buffer limits to better handle such rate variations.
Blockers: N/A
Risk of Delayed Approval: Delaying approval blocks $22.4 million in total annualized entitlements.
8. RELATED or PREVIOUSLY APPROVED PROJECTS
§ AFE time with optimal MoW (5-20) +4%
§ Mix time with optimal MoW (5-20) +4%
§ Single Smalls time with optimal MoW (5-20) +65%
§ Smartpac time with optimal MoW (5-20) +68%
Has this problem been addressed at other site types or business units within Amazon? If so, detail results from that implementation. à No
Are there related projects being implemented concurrently and if so, how does one effect the other if this change is not approved?
Table 2: Previously Approved Projects
1
BAEv2
Current active buffer configuration model – ARC owned deployed in 2022.
Internal Labor
$34.3M
N/A
-
N/A
N/A
Appendix
i. Intervention Analysis visual for BAEv3 cycle time
ii. Intervention Analysis visual for BAEv3 outbound rate
iii. Pilot Analysis on Time In Buffer
During the post-intervention period (pilot), the response variable had an average value of approx. 0.38. By contrast, in the absence of an intervention, we would have expected an average response of 0.25. The 95% interval of this counterfactual prediction is [0.23, 0.27]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 0.13 with a 95% interval of [0.11, 0.15]. For a discussion of the significance of this effect, see below:
· Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 3.05. By contrast, had the intervention not taken place, we would have expected a sum of 2.02. The 95% interval of this prediction is [1.86, 2.17].
· The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +52%. The 95% interval of this percentage is [+40%, +64%].
· This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (0.13) to the original goal of the underlying intervention.
The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0). This means the causal effect can be considered statistically significant.
Pilot Site Identification
The following approach was chosen to select sites:
Define site characteristics (rates, recirculation, cycle times, etc.)
Perform a principal component analysis for these characteristics
Cluster sites per generation
Choose one site per cluster and generation
a) Ideally not one involved in another flow pilot for dilution/interference
b) Site opt-in
Data Sources
Data preparation
TIB analysis and descriptive statistic: No data preprocessing except for the automatic exclusion of missing data.
ARIMA: Missing values were filled using interpolation as long as the missing data point was between observation dates. If the missing data was found at the beginning of the time series, it was replaced with the value of its immediately following date. We also added 1×10−9 to the “In Buffer Recirculation” and “In Buffer Time OOW” time series to make them suitable for Box-Cox/Yeo-Johnson power transformations (values of zero cannot be transformed).
Experimentation Date Range
Entire Data Time Frame: 5/23/2023 00:00 through 7/3/2023 23:59
Baseline Data Time Frame: 5/23/2023 00:00 through 6/19/2023 23:59
Intervention Data Time Frame: 6/20/2023 00:00 through 7/3/2023 23:59
Exclusions from the analysis:
OMA2: Excluded due to resistance to recommendations and opting-out of pilot.
PSP1: Excluded due to inaccurate determination of DSM buffer limits (inclusion of noisy data/outliers)
GRR1: Excluded due to dilution from other pilot
ORF3: Observations from 06/24 were excluded due to mechanical issues at the site.
PPCW1000 process paths: Given CF’s previous work and concern about whether our more generic models translate to this path’s buffer limits, we excluded it from the pilot and analysis.
Outlier removal, filtering: The DSM model extended for the calculation of in-picking and in-transit buffers removes outliers using inter-quartile range filtering (similar to Box-Whisker plots) and/or uses the median instead of the mean to calculate rates). No outliers were removed for the ARIMA analysis.
VC5+ and pack rate for AFEs
Buffer Health Analysis - https://tiny.amazon.com/17um2hx59/BHA
DSM (Draft) - https://tiny.amazon.com/sbpuw7oh/DSMInputs
BAE v3 - https://tiny.amazon.com/qwnx8v1g/BAEv3
BAE v2 (Prod) - https://tiny.amazon.com/11kxy62k8/BAEProduction
Frequently Asked Questions (FAQs):
Q1: Can a t-test be conducted?
A1: A t-test is completed on “forecasted” vs “observed with 7-point (observations). This pilot was not designed as a “true random experiment” with control groups, it was designed for an intervention analysis. T-tests on time series can be misleading due to autocorrelation, trends and seasonality. Usage of such to draw a conclusion is a subject of interpretation. In other words, having a perfect t-test p-value will not necessarily guarantee true changes in cases of time series data. Simple upward trends will show a perfect t-test p-value, but not change much due to intervention at the same time.
Q2: How are future developments of buffer modeling and automation being communicated to stakeholders?
A2: ACES distributes a bi-weekly NACF Multi-Flow Initiative flash (since week 18) surrounding S-Team goal, Outbound Flow Management Automation, (King Pin ID: 579820), to achieve a 2% productivity increase by December 31, 2023. Recipients of this flash include: AFT, ACES, EU Flow, iDEA, Central Flow, PE, SRD, and GMs. Archived editions can be found here.
Q3: What are other influences to AFE OOW instances?
A3: (a) Current Central Flow standard work to induce “sort throttles” can increase both induct and pack OOW instances. Sort throttle is an AutoFlow override reducing induct rates by 20% if packables per wall (PPW) exceeds 50% over max threshold. (b) Low CE backlog and cadence of generating new OSP plans intra-interval to reallocate labor to paths with available pick pools.
Q4: What is the impact of change to OOW instances from the pilot?
A4: Insignificant.
Spearman's rank correlation rho
data: ndf$`OOW%` and ndf$outbound_rate
S = 1016930, p-value = 0.3627
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.0668963