Class: TreeHaver::GrammarFinder
- Inherits:
-
Object
- Object
- TreeHaver::GrammarFinder
- Defined in:
- lib/tree_haver/grammar_finder.rb
Overview
Generic utility for finding tree-sitter grammar shared libraries.
GrammarFinder provides platform-aware discovery of tree-sitter grammar
libraries. Given a language name, it searches common installation paths
and supports environment variable overrides.
This class is designed to be used by language-specific merge gems
(toml-merge, json-merge, bash-merge, etc.) without requiring TreeHaver
to have knowledge of each specific language.
== Security Considerations
Loading shared libraries is inherently dangerous as it executes arbitrary
native code. GrammarFinder performs the following security validations:
- Language names are validated to contain only safe characters
- Paths from environment variables are validated before use
- Path traversal attempts (../) are rejected
- Only files with expected extensions (.so, .dylib, .dll) are accepted
For additional security, use #find_library_path_safe which only returns
paths from trusted system directories.
Constant Summary collapse
- BASE_SEARCH_DIRS =
Common base directories where tree-sitter libraries are installed
Platform-specific extensions are appended automatically.
User-local XDG paths (~/.local/lib/tree-sitter) are added dynamically
in #user_search_dirs so that HOME expansion happens at call time. [ "/usr/lib", "/usr/lib64", "/usr/local/lib", "/opt/homebrew/lib", "/home/linuxbrew/.linuxbrew/lib", ].freeze
- TREE_SITTER_BACKENDS =
Backends that use tree-sitter (require native runtime libraries)
Other backends (Citrus, Prism, Psych, etc.) don’t use tree-sitter [ TreeHaver::Backends::MRI, TreeHaver::Backends::FFI, TreeHaver::Backends::Rust, TreeHaver::Backends::Java, ].freeze
Instance Attribute Summary collapse
-
#extra_paths ⇒ Array<String>
readonly
Additional search paths provided at initialization.
-
#language_name ⇒ Symbol
readonly
The language identifier.
Class Method Summary collapse
-
.reset_runtime_check! ⇒ Object
private
Reset the cached tree-sitter runtime check (for testing).
-
.tree_sitter_runtime_usable? ⇒ Boolean
Check if the tree-sitter runtime is usable.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Check if the grammar library is available AND usable.
-
#available_safe? ⇒ Boolean
Check if the grammar library is available in a trusted directory.
-
#env_var_name ⇒ String
Get the environment variable name for this language.
-
#find_library_path ⇒ String?
Find the grammar library path.
-
#find_library_path_safe ⇒ String?
Find the grammar library path with strict security validation.
-
#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder
constructor
Initialize a grammar finder for a specific language.
-
#library_filename ⇒ String
Get the library filename for the current platform.
-
#not_found_message ⇒ String
Get a human-readable error message when library is not found.
-
#register!(raise_on_missing: false) ⇒ Boolean
Register this language with TreeHaver.
-
#search_info ⇒ Hash
Get debug information about the search.
-
#search_paths ⇒ Array<String>
Generate the full list of search paths for this language.
-
#symbol_name ⇒ String
Get the expected symbol name exported by the grammar library.
-
#validate_env_path(path) ⇒ String?
Validate an environment variable path and return reason if invalid.
Constructor Details
#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder
Initialize a grammar finder for a specific language
78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/tree_haver/grammar_finder.rb', line 78 def initialize(language_name, extra_paths: [], validate: true) name_str = language_name.to_s.downcase if validate && !PathValidator.safe_language_name?(name_str) raise ArgumentError, "Invalid language name: #{language_name.inspect}. " \ "Language names must start with a letter and contain only lowercase letters, numbers, and underscores." end @language_name = name_str.to_sym @extra_paths = Array(extra_paths) end |
Instance Attribute Details
#extra_paths ⇒ Array<String> (readonly)
Returns additional search paths provided at initialization.
70 71 72 |
# File 'lib/tree_haver/grammar_finder.rb', line 70 def extra_paths @extra_paths end |
#language_name ⇒ Symbol (readonly)
Returns the language identifier.
67 68 69 |
# File 'lib/tree_haver/grammar_finder.rb', line 67 def language_name @language_name end |
Class Method Details
.reset_runtime_check! ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Reset the cached tree-sitter runtime check (for testing)
292 293 294 |
# File 'lib/tree_haver/grammar_finder.rb', line 292 def reset_runtime_check! remove_instance_variable(:@tree_sitter_runtime_usable) if defined?(@tree_sitter_runtime_usable) end |
.tree_sitter_runtime_usable? ⇒ Boolean
Check if the tree-sitter runtime is usable
Tests whether we can actually create a tree-sitter parser.
Result is cached since this is expensive and won’t change during runtime.
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
# File 'lib/tree_haver/grammar_finder.rb', line 267 def tree_sitter_runtime_usable? return @tree_sitter_runtime_usable if defined?(@tree_sitter_runtime_usable) @tree_sitter_runtime_usable = begin # Try to create a parser using the current backend mod = TreeHaver.resolve_backend_module(nil) # Only tree-sitter backends are relevant here # Non-tree-sitter backends (Citrus, Prism, Psych, etc.) don't use grammar files if mod.nil? || !TREE_SITTER_BACKENDS.include?(mod) false else # Try to instantiate a parser - this will fail if runtime isn't available mod::Parser.new true end rescue NoMethodError, LoadError, NotAvailable => _e # Note: FFI::NotFoundError inherits from LoadError, so it's caught here too false end end |
Instance Method Details
#available? ⇒ Boolean
Check if the grammar library is available AND usable
This checks:
- The grammar library file exists
- The tree-sitter runtime is functional (can create a parser)
This prevents registering grammars when tree-sitter isn’t actually usable,
allowing clean fallback to alternative backends like Citrus.
242 243 244 245 246 247 248 249 |
# File 'lib/tree_haver/grammar_finder.rb', line 242 def available? path = find_library_path return false if path.nil? # Check if tree-sitter runtime is actually functional # This is cached at the class level since it's the same for all grammars self.class.tree_sitter_runtime_usable? end |
#available_safe? ⇒ Boolean
Check if the grammar library is available in a trusted directory
301 302 303 |
# File 'lib/tree_haver/grammar_finder.rb', line 301 def available_safe? !find_library_path_safe.nil? end |
#env_var_name ⇒ String
Get the environment variable name for this language
93 94 95 |
# File 'lib/tree_haver/grammar_finder.rb', line 93 def env_var_name "TREE_SITTER_#{@language_name.to_s.upcase}_PATH" end |
#find_library_path ⇒ String?
Paths from ENV are validated using PathValidator.safe_library_path?
to prevent path traversal and other attacks. Invalid ENV paths cause
an error to be raised (Principle of Least Surprise - explicit paths must work).
Setting the ENV variable to an empty string explicitly disables
this grammar. This allows fallback to alternative backends (e.g., Citrus).
Find the grammar library path
Searches in order:
- Environment variable override (validated for safety)
- Extra paths provided at initialization
- Common system installation paths
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/tree_haver/grammar_finder.rb', line 155 def find_library_path # Check environment variable first (highest priority) # Use key? to distinguish between "not set" and "set to empty" env_var = env_var_name if ENV[env_var] || ENV.key?(env_var) env_path = ENV[env_var] # :nocov: defensive - ENV.key? true with nil value is rare edge case if env_path.nil? @env_rejection_reason = "explicitly disabled (set to nil)" return end # :nocov: # Empty string means "explicitly skip this grammar" # This allows users to disable tree-sitter for specific languages # and fall back to alternative backends like Citrus if env_path.empty? @env_rejection_reason = "explicitly disabled (set to empty string)" return end # Store why env path was rejected for better error messages @env_rejection_reason = validate_env_path(env_path) # Principle of Least Surprise: If user explicitly sets an ENV variable # to a path, that path MUST work. Don't silently fall back to auto-discovery. if @env_rejection_reason raise TreeHaver::NotAvailable, "#{env_var_name} is set to #{env_path.inspect} but #{@env_rejection_reason}. " \ "Either fix the path, unset the variable to use auto-discovery, " \ "or set it to empty string to explicitly disable this grammar." end return env_path end # Search all paths (these are constructed from trusted base dirs) search_paths.find { |path| File.exist?(path) } end |
#find_library_path_safe ⇒ String?
Find the grammar library path with strict security validation
This method only returns paths that are in trusted system directories.
Use this when you want maximum security and don’t need to support
custom installation locations.
225 226 227 228 229 230 |
# File 'lib/tree_haver/grammar_finder.rb', line 225 def find_library_path_safe # Environment variable is NOT checked in safe mode - only trusted system paths search_paths.find do |path| File.exist?(path) && PathValidator.in_trusted_directory?(path) end end |
#library_filename ⇒ String
Get the library filename for the current platform
107 108 109 110 |
# File 'lib/tree_haver/grammar_finder.rb', line 107 def library_filename ext = platform_extension "libtree-sitter-#{@language_name}#{ext}" end |
#not_found_message ⇒ String
Get a human-readable error message when library is not found
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
# File 'lib/tree_haver/grammar_finder.rb', line 347 def msg = "tree-sitter #{@language_name} grammar not found." # Check if env var is set but rejected env_value = ENV[env_var_name] msg += if env_value && @env_rejection_reason " #{env_var_name} is set to #{env_value.inspect} but #{@env_rejection_reason}." elsif env_value && File.exist?(env_value) && !self.class.tree_sitter_runtime_usable? " #{env_var_name} is set and file exists, but no tree-sitter runtime is available. " \ "Add ruby_tree_sitter, ffi, or tree_stump gem to your Gemfile." elsif env_value " #{env_var_name} is set but was not used (file may have been removed)." else " Searched: #{search_paths.join(", ")}." end msg + " Install tree-sitter-#{@language_name} or set #{env_var_name} to a valid path." end |
#register!(raise_on_missing: false) ⇒ Boolean
Register this language with TreeHaver
After registration, the language can be loaded via dynamic method
(e.g., TreeHaver::Language.toml).
313 314 315 316 317 318 319 320 321 322 323 324 |
# File 'lib/tree_haver/grammar_finder.rb', line 313 def register!(raise_on_missing: false) path = find_library_path unless path if raise_on_missing raise NotAvailable, end return false end TreeHaver.register_language(@language_name, path: path, symbol: symbol_name) true end |
#search_info ⇒ Hash
Get debug information about the search
329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
# File 'lib/tree_haver/grammar_finder.rb', line 329 def search_info found = find_library_path # This populates @env_rejection_reason { language: @language_name, env_var: env_var_name, env_value: ENV[env_var_name], env_rejection_reason: @env_rejection_reason, symbol: symbol_name, library_filename: library_filename, search_paths: search_paths, found_path: found, available: !found.nil?, } end |
#search_paths ⇒ Array<String>
Generate the full list of search paths for this language
Order: ENV override, extra_paths, user-local paths, then system paths
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/tree_haver/grammar_finder.rb', line 117 def search_paths paths = [] # Extra paths provided at initialization (searched after ENV) @extra_paths.each do |dir| paths << File.join(dir, library_filename) end # User-local XDG paths (e.g. ~/.local/lib/tree-sitter/) user_search_dirs.each do |dir| paths << File.join(dir, library_filename) end # Common system paths with platform-appropriate extension BASE_SEARCH_DIRS.each do |dir| paths << File.join(dir, library_filename) end paths end |
#symbol_name ⇒ String
Get the expected symbol name exported by the grammar library
100 101 102 |
# File 'lib/tree_haver/grammar_finder.rb', line 100 def symbol_name "tree_sitter_#{@language_name}" end |
#validate_env_path(path) ⇒ String?
Validate an environment variable path and return reason if invalid
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/tree_haver/grammar_finder.rb', line 198 def validate_env_path(path) # Check for leading/trailing whitespace if path != path.strip return "contains leading or trailing whitespace (use #{path.strip.inspect})" end # Check if path is safe unless PathValidator.safe_library_path?(path) return "failed security validation (may contain path traversal or suspicious characters)" end # Check if file exists unless File.exist?(path) return "file does not exist" end nil # Valid! end |