diff --git a/AGENTS.md b/AGENTS.md index d6bfef5..ec81ce9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -23,13 +23,17 @@ The project uses a multi-layered approach to understand the Skill language: ### Key Components - **`skillls/main.py`**: The entry point of the LSP server. It implements the `LanguageServer` class and contains the handlers for LSP lifecycle events (`initialize`, `didOpen`, `didChange`, etc.) and feature requests (`inlayHint`, `documentSymbol`). -- **`skillls/checker.py`**: Contains the logic for syntactic validation, specifically the algorithm for detecting unbalanced parentheses. -- **`skillls/helpers.py`**: Provides the heavy lifting for text processing, including the content cleaning state machine and the recursive logic for building the node hierarchy. +- **`skillls/parser.py`**: The new Tree-sitter based parser for syntax tree traversal and symbol extraction. - **`skillls/types.py`**: Defines the internal data models (e.g., `Node`, `URI`) used across the project. +## Roadmap & Engineering Planning + +For details on identified technical debt, fragilities, and the long-term architectural hardening strategy, refer to [PLAN.md](./PLAN.md). + ## Technical Stack - **Language**: Python 3.11+ +- **Package Management**: `uv` - **LSP Framework**: `pygls` (Python Language Server) -- **Parsing Utilities**: `parsimonious` (PEG parser), `tree-sitter` (for structural tree analysis). -- **Formatting & Tooling**: `rich` (terminal output), `black`, `ruff`, `mypy`. +- **Parsing Utilities**: `tree-sitter` (for structural tree analysis). +- **Formatting & Tooling**: `rich` (terminal output), `ruff`, `mypy`, `pytest`. diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..7a47634 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,31 @@ +# Project Hardening Plan + +This document outlines the identified fragilities in the `skillls` project and the planned architectural improvements to transform it from a functional prototype into a robust, production-ready Language Server. + +## 1. Grammar-Logic Decoupling +**Problem**: The `SkillParser` relies on hardcoded string literals (e/g., `"function_definition"`) to identify symbols. Changes in the underlying `tree-sitter-skill` grammar will cause silent failures in the Outline view. +**Goal**: Create a stable contract between the grammar and the parser. +**Proposed Actions**: +- [x] Implement a shared constants module or configuration file that defines significant node types. +- [ ] (Long-term) Explore using Tree-sitter Queries (`Query` API) to match patterns instead of manual type checking, making the parser less dependent on specific node names and more focused on structural patterns. + +## 2. Iterative AST Traversal +**Problem**: The current recursive traversal in `_traverse_tree` is susceptible to `RecursionError` on deeply nested files. +**Goal**: Ensure the server can handle arbitrarily deep syntax trees without crashing. +**Proposed Actions**: +- [ ] Refactor `SkillParser._traverse_tree` to use an iterative approach (using a stack/deque) instead of recursion. + +## s3. Single Source of Truth for Errors +**Problem**: The project is in a transitional state where error management is split between the new `SkillParser` diagnostics and the legacy `server.errs` dictionary in `main.py`. +**Goal**: Unify error reporting into a single, streamlined pipeline. +**Proposed Actions**: +- [ ] Complete the refactor of `skillls/main.py`. +- [ ] Remove the `errs` dictionary from `SkillLanguageServer`. +- [ ] Decommission and delete deprecated files: `skillls/checker.py` and unused parts of `skillls/helpers.py`. + +## 4. Dependency Management Stabilization +**Problem**: The dependency on a private SSH Git URL for `tree-sitter-skill` introduces external failure points into the build pipeline. +**Goal**: Stabilize the build environment. +**Proposed Actions**: +- [ ] Evaluate the feasibility of publishing `tree-sitter-skill` to a private PyPI registry or a more accessible artifact repository. +- [ ] Implement a fallback/vendoring strategy for critical grammar components if possible. diff --git a/skillls/constants.py b/skillls/constants.py new file mode 100644 index 0000000..7a53d02 --- /dev/null +++ b/skillls/constants.py @@ -0,0 +1,19 @@ +""" +Centralized constants for the Skill language parser and LSP server. +""" + +from typing import Final, Set + +# Node types that represent syntax errors in Tree-sitter +ERROR_NODE_TYPES: Final[Set[str]] = {"ERROR", "MISSING"} + +# Node types that are considered significant enough to appear in the Document Symbol outline +SYMBOLIC_NODE_TYPES: Final[Set[str]] = { + "function_definition", + "procedure_definition", + "namespace", + "let_binding", +} + +# Node types used to identify names/identifiers within symbolic nodes +IDENTIFIER_NODE_TYPES: Final[Set[str]] = {"identifier", "name"} diff --git a/skillls/parser.py b/skillls/parser.py index 86f62bf..e3a74d4 100644 --- a/skillls/parser.py +++ b/skillls/parser.py @@ -9,6 +9,7 @@ from lsprotocol.types import ( SymbolKind, ) from pygls.workspace import TextDocument +from skillls.constants import ERROR_NODE_TYPES, IDENTIFIER_NODE_TYPES, SYMBOLIC_NODE_TYPES class SkillParser: """ @@ -51,7 +52,7 @@ class SkillParser: """Recursively traverses the AST to find errors and symbols.""" # 1. Handle Errors (Diagnostics) - if node.type == "ERROR" or node.type == "MISSING": + if node.type in ERROR_NODE_TYPES: start_point = node.start_point end_point = node.end_point @@ -78,14 +79,13 @@ class SkillParser: def _is_symbol_node(self, node) -> bool: """Determines if a node is significant enough to be an outline symbol.""" - symbolic_types = {"function_definition", "procedure_definition", "namespace", "let_binding"} - return node.type in symbolic_types or node.type.endswith("_def") + return node.type in SYMBOLIC_NODE_TYPES or node.type.endswith("_def") def _create_document_symbol(self, node, content: str) -> DocumentSymbol | None: """Extracts a name and range for an AST node to create an LSP symbol.""" name = None for child in node.children: - if child.type == "identifier" or child.type == "name": + if child.type in IDENTIFIER_NODE_TYPES: start_byte = child.start_byte end_byte = child.end_byte name = content[start_byte:end_byte]