TagDebt: A Bot to Support Technical Debt Management

Source: arXiv:2605.29869 · Published 2026-05-28 · By João Paulo Biazotto, Daniel Feitosa, Paris Avgeriou, Elisa Yumi Nakagawa

TL;DR

The paper addresses the practical challenge of managing technical debt (TD) within software development workflows. Although TD is widely recognized as a key factor impacting maintainability and project costs, existing tools for technical debt management (TDM) often require significant configuration, disrupt existing workflows, or focus on limited debt types (e.g., code comments). The authors present TagDebt, an open-source GitHub bot that automatically detects and labels self-admitted technical debt (SATD) in issue tracker texts using natural language processing (NLP) techniques. By integrating labeling directly into issue workflows, TagDebt reduces manual effort and aligns closely with practitioners' existing processes.

The novel contribution lies in operationalizing SATD detection via a bot that can be easily plugged into GitHub repositories without workflow disruption or explicit tagging conventions. The authors conducted a Technology Acceptance Model (TAM) study with 16 software practitioners to evaluate TagDebt’s perceived usefulness, ease of use, and adoption factors. Results indicate practitioners find TagDebt helpful for organizing and triaging issues, appreciating its low configuration overhead and clear documentation; however, team and project size influence adoption willingness. Overall, TagDebt is presented as a proof-of-concept specialized TDM tool designed to improve technical debt visibility and facilitate early intervention in issue tracking, with prospects for future extensibility.

Key findings

TagDebt automatically labels GitHub issues as SATD or non-SATD using NLP models, including an existing SATD classifier by Li et al. (2022) and a GPT-5-mini based LLM approach.
In a TAM evaluation with 16 practitioners, most found TagDebt useful for organizing issues and reducing manual effort in technical debt identification.
Practitioners also rated the bot as easy to use, and appreciated its clear documentation and low disruption to existing workflows.
Contextual factors such as team size and codebase size significantly affect practitioners’ intention to adopt TagDebt.
TagDebt uniquely detects SATD in issue titles and descriptions without requiring explicit tags (e.g., TODO, FIXME), distinguishing it from prior bots.
Compared to other TDM bots (BreakBot, Refactoring-Bot, TODO Bot, FixMe), TagDebt covers multiple technical debt types via issue analysis rather than limited source code comments or specific tagging keywords.
Practitioners suggested future improvements including additional features for source code analysis and automated updates to support wider TDM activities.
TagDebt implements a modular SATD detection component allowing replacement or enhancement with future NLP models to improve accuracy and adaptability.

Threat model

n/a (the paper focuses on tool design and adoption rather than security threats). The assumed adversary is a typical software development environment where TD identification is manually challenging, with no adversarial attacks considered on the detection models or workflows.

Methodology — deep read

Threat Model & Assumptions: The adversary model is not explicitly a security adversary but considers typical software engineering users and contexts. The tool assumes developers document technical debt as self-admitted technical debt (SATD) in natural language within GitHub issues. It relies on this SATD presence for detection and labeling. The adversary, in this case, could be software teams or projects struggling to manually identify and track TD, seeking workflow-friendly automation. There are no explicit threat or attack capabilities considered beyond normal operational context.
Data: TagDebt analyzes GitHub issues' titles and descriptions to identify SATD items. The training data for SATD detection models includes prior SATD datasets used in the literature, notably from Li et al. (2022), who developed a supervised ML model for SATD classification. The bot also supports a GPT-5-mini large language model variant for detection. The exact size or splits of datasets used to train these internal ML models are not described in detail here, but are established prior SATD corpora and models from referenced work. The bot processes issues in real-time post creation.
Architecture / Algorithm: TagDebt is implemented as a GitHub App (bot). On issue creation or update events, the bot extracts the textual content (title and description). This text is fed into the SATD detection module. This detection module is pluggable: currently supporting the Li et al. (2022) SATD classifier and an LLM-based (GPT-5-mini) solution. The classifier outputs a binary label: SATD or non-SATD. Based on this, TagDebt applies the corresponding label to the issue via the GitHub API.

The bot avoids noisy behaviors common to other bots (e.g., excessive commenting) by focusing on silent labeling actions. Configuration is done via a YAML config file stored in the repository to allow easy customization.

Training Regime: The bot itself is a runtime system and does not train models. The embedded SATD models were pretrained externally on annotated issue/comment corpora as done in prior work. The paper does not provide novel model training details but rather integrates existing SATD detection models.
Evaluation Protocol: Evaluation was conducted via a Technology Acceptance Model (TAM) study with 16 software development practitioners of varied roles and team sizes. Interviews were used to assess perceived usefulness, ease of use, contextual adoption factors, and improvement suggestions. This qualitative empirical approach focuses on human factors, not automated accuracy metrics or adversarial robustness. No quantitative detection precision/recall was reported in this work.
Reproducibility: TagDebt is fully open-source and publicly available on GitHub Marketplace, enabling replication and extension. The SATD detection components use well-known models from literature, though the paper does not share training datasets or weights directly within the bot itself. The evaluation methodology via adapted TAM questionnaire is also described for reproducibility.

Concrete Example: When a new GitHub issue is opened, TagDebt listens for the event. It extracts the issue's title and description texts, sends them to the selected SATD classifier. The classifier returns a label 'SATD' if the text likely describes self-admitted technical debt, or 'non-SATD' otherwise. TagDebt then calls GitHub API to apply the 'SATD' label to the issue if true, or does nothing if not. This automated labeling facilitates triage and prioritization without manual toil or changes to existing developer workflows.

Technical innovations

Development of a GitHub bot (TagDebt) that integrates SATD detection directly into issue management workflows with minimal configuration.
Use of a pluggable SATD detection module allowing tool-agnostic NLP model replacement, supporting both traditional ML and LLM approaches.
Labeling of SATD in issue tracker text without requiring explicit tags or keywords (e.g., TODO) unlike prior TD bots dependent on annotations.
Empirical TAM-based evaluation methodology adapted specifically for TDM tool usability and adoption in practitioner contexts.

Baselines vs proposed

Compared to BreakBot (build TD, static code analysis), TagDebt supports multiple TD types via issue text NLP detection.
Compared to Refactoring-Bot (code smell detection and refactorings), TagDebt operates at earlier development stages through issue labeling.
Compared to TODO Bot and FixMe Bot (reliant on explicit TODO/FIXME tags in code/comments), TagDebt identifies implicit SATD from natural language in issues without tagging constraints.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.29869.

Fig 3

Fig 3: Visualization of TagDebt’s GitHub App

Fig 4

Fig 4: TagDebt’s default configuration file

Fig 5

Fig 5: Overview of TagDebt’s Application Component

Fig 6

Fig 6: TagDebt’s execution flow

Fig 5

Fig 5 (page 17).

Fig 7

Fig 7: TagDebt’s detection module and two examples of compatible NLP solutions

Limitations

TagDebt can detect only self-admitted technical debt items documented in natural language within issues; it cannot detect non-documented or non-SATD types of TD in source code or other artifacts.
The quality of detection depends on the underlying NLP models which may produce false positives or false negatives; no quantitative accuracy evaluation was reported here.
Practitioner evaluation involved a small sample of 16 participants, which may limit generalizability of adoption findings.
The TAM evaluation focuses on perceptions and did not measure long-term usage or impact on actual TDM outcomes such as reduced TD accumulation.
The bot currently labels issues but does not provide automated remediation, prioritization, or source code updates which practitioners suggested as future improvements.
Detection models and data are dependent on English language and SATD expressed in issue text, limiting applicability in multilingual or non-textual contexts.

Open questions / follow-ons

How can automated SATD detection be extended to multiple languages or non-issue artifacts (e.g., pull requests, code comments)?
What is the quantitative accuracy and error profile of TagDebt’s combined NLP models in large-scale real-world GitHub projects?
How does long-term integration of TagDebt affect actual TD management outcomes such as repayment rates, prioritization quality, or maintenance costs?
Can feedback loops be implemented to incorporate developer corrections to improve detection models dynamically?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, TagDebt offers a concrete example of applying natural language processing in automated labeling bots that integrate tightly into collaborative workflows (GitHub issues). Its design considerations around low-disruption integration, modular detection components, and acceptance by real users provide insight on effective bot deployment in live environments—important when designing bot-based defenses that require operational practicality.

TagDebt’s approach to detecting nuanced natural language representations (SATD) without explicit triggers is relevant analogously to CAPTCHA systems tasked with interpreting user intent or suspicious behavior from subtle signals. The study highlights that beyond pure detection accuracy, factors like ease of integration, configurability, and user trust significantly influence adoption—critical lessons for CAPTCHA and bot mitigation tool engineers balancing robustness with user experience.

Cite

bibtex

@article{arxiv2605_29869,
  title={ TagDebt: A Bot to Support Technical Debt Management },
  author={ João Paulo Biazotto and Daniel Feitosa and Paris Avgeriou and Elisa Yumi Nakagawa },
  journal={arXiv preprint arXiv:2605.29869},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.29869}
}

TagDebt: A Bot to Support Technical Debt Management ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​