Class: TreeHaver::GrammarFinder

Inherits:
Object
  • Object
show all
Defined in:
lib/tree_haver/grammar_finder.rb

Overview

Generic utility for finding tree-sitter grammar shared libraries.

GrammarFinder provides platform-aware discovery of tree-sitter grammar
libraries. Given a language name, it searches common installation paths
and supports environment variable overrides.

This class is designed to be used by language-specific merge gems
(toml-merge, json-merge, bash-merge, etc.) without requiring TreeHaver
to have knowledge of each specific language.

== Security Considerations

Loading shared libraries is inherently dangerous as it executes arbitrary
native code. GrammarFinder performs the following security validations:

  • Language names are validated to contain only safe characters
  • Paths from environment variables are validated before use
  • Path traversal attempts (../) are rejected
  • Only files with expected extensions (.so, .dylib, .dll) are accepted

For additional security, use #find_library_path_safe which only returns
paths from trusted system directories.

Examples:

Basic usage

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path
# => "/usr/lib/libtree-sitter-toml.so"

Check availability

finder = TreeHaver::GrammarFinder.new(:json)
if finder.available?
  language = TreeHaver::Language.load(finder.language_name, finder.find_library_path)
end

Register with TreeHaver

finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?
# Now you can use: TreeHaver::Language.bash

With custom search paths

finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: ["/opt/custom/lib"])

Secure mode (trusted directories only)

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path_safe  # Only returns paths in trusted dirs

See Also:

Constant Summary collapse

BASE_SEARCH_DIRS =

Common base directories where tree-sitter libraries are installed
Platform-specific extensions are appended automatically.
User-local XDG paths (~/.local/lib/tree-sitter) are added dynamically
in #user_search_dirs so that HOME expansion happens at call time.

[
  "/usr/lib",
  "/usr/lib64",
  "/usr/local/lib",
  "/opt/homebrew/lib",
  "/home/linuxbrew/.linuxbrew/lib",
].freeze
TREE_SITTER_BACKENDS =

Backends that use tree-sitter (require native runtime libraries)
Other backends (Citrus, Prism, Psych, etc.) don’t use tree-sitter

[
  TreeHaver::Backends::MRI,
  TreeHaver::Backends::FFI,
  TreeHaver::Backends::Rust,
  TreeHaver::Backends::Java,
].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder

Initialize a grammar finder for a specific language

Parameters:

  • language_name (Symbol, String)

    the tree-sitter language name (e.g., :toml, :json, :bash)

  • extra_paths (Array<String>) (defaults to: [])

    additional paths to search (searched first after ENV)

  • validate (Boolean) (defaults to: true)

    if true, validates the language name (default: true)

Raises:

  • (ArgumentError)

    if language_name is invalid and validate is true



78
79
80
81
82
83
84
85
86
87
88
# File 'lib/tree_haver/grammar_finder.rb', line 78

def initialize(language_name, extra_paths: [], validate: true)
  name_str = language_name.to_s.downcase

  if validate && !PathValidator.safe_language_name?(name_str)
    raise ArgumentError, "Invalid language name: #{language_name.inspect}. " \
      "Language names must start with a letter and contain only lowercase letters, numbers, and underscores."
  end

  @language_name = name_str.to_sym
  @extra_paths = Array(extra_paths)
end

Instance Attribute Details

#extra_pathsArray<String> (readonly)

Returns additional search paths provided at initialization.

Returns:

  • (Array<String>)

    additional search paths provided at initialization



70
71
72
# File 'lib/tree_haver/grammar_finder.rb', line 70

def extra_paths
  @extra_paths
end

#language_nameSymbol (readonly)

Returns the language identifier.

Returns:

  • (Symbol)

    the language identifier



67
68
69
# File 'lib/tree_haver/grammar_finder.rb', line 67

def language_name
  @language_name
end

Class Method Details

.reset_runtime_check!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Reset the cached tree-sitter runtime check (for testing)



292
293
294
# File 'lib/tree_haver/grammar_finder.rb', line 292

def reset_runtime_check!
  remove_instance_variable(:@tree_sitter_runtime_usable) if defined?(@tree_sitter_runtime_usable)
end

.tree_sitter_runtime_usable?Boolean

Check if the tree-sitter runtime is usable

Tests whether we can actually create a tree-sitter parser.
Result is cached since this is expensive and won’t change during runtime.

Returns:

  • (Boolean)

    true if tree-sitter runtime is functional



267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/tree_haver/grammar_finder.rb', line 267

def tree_sitter_runtime_usable?
  return @tree_sitter_runtime_usable if defined?(@tree_sitter_runtime_usable)

  @tree_sitter_runtime_usable = begin
    # Try to create a parser using the current backend
    mod = TreeHaver.resolve_backend_module(nil)

    # Only tree-sitter backends are relevant here
    # Non-tree-sitter backends (Citrus, Prism, Psych, etc.) don't use grammar files
    if mod.nil? || !TREE_SITTER_BACKENDS.include?(mod)
      false
    else
      # Try to instantiate a parser - this will fail if runtime isn't available
      mod::Parser.new
      true
    end
  rescue NoMethodError, LoadError, NotAvailable => _e
    # Note: FFI::NotFoundError inherits from LoadError, so it's caught here too
    false
  end
end

Instance Method Details

#available?Boolean

Check if the grammar library is available AND usable

This checks:

  1. The grammar library file exists
  2. The tree-sitter runtime is functional (can create a parser)

This prevents registering grammars when tree-sitter isn’t actually usable,
allowing clean fallback to alternative backends like Citrus.

Returns:

  • (Boolean)

    true if the library can be found AND tree-sitter runtime works



242
243
244
245
246
247
248
249
# File 'lib/tree_haver/grammar_finder.rb', line 242

def available?
  path = find_library_path
  return false if path.nil?

  # Check if tree-sitter runtime is actually functional
  # This is cached at the class level since it's the same for all grammars
  self.class.tree_sitter_runtime_usable?
end

#available_safe?Boolean

Check if the grammar library is available in a trusted directory

Returns:

  • (Boolean)

    true if the library can be found in a trusted directory

See Also:



301
302
303
# File 'lib/tree_haver/grammar_finder.rb', line 301

def available_safe?
  !find_library_path_safe.nil?
end

#env_var_nameString

Get the environment variable name for this language

Returns:

  • (String)

    the ENV var name (e.g., “TREE_SITTER_TOML_PATH”)



93
94
95
# File 'lib/tree_haver/grammar_finder.rb', line 93

def env_var_name
  "TREE_SITTER_#{@language_name.to_s.upcase}_PATH"
end

#find_library_pathString?

Note:

Paths from ENV are validated using PathValidator.safe_library_path?
to prevent path traversal and other attacks. Invalid ENV paths cause
an error to be raised (Principle of Least Surprise - explicit paths must work).

Note:

Setting the ENV variable to an empty string explicitly disables
this grammar. This allows fallback to alternative backends (e.g., Citrus).

Find the grammar library path

Searches in order:

  1. Environment variable override (validated for safety)
  2. Extra paths provided at initialization
  3. Common system installation paths

Returns:

  • (String, nil)

    the path to the library, or nil if not found

Raises:

See Also:



155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/tree_haver/grammar_finder.rb', line 155

def find_library_path
  # Check environment variable first (highest priority)
  # Use key? to distinguish between "not set" and "set to empty"
  env_var = env_var_name
  if ENV[env_var] || ENV.key?(env_var)
    env_path = ENV[env_var]

    # :nocov: defensive - ENV.key? true with nil value is rare edge case
    if env_path.nil?
      @env_rejection_reason = "explicitly disabled (set to nil)"
      return
    end
    # :nocov:

    # Empty string means "explicitly skip this grammar"
    # This allows users to disable tree-sitter for specific languages
    # and fall back to alternative backends like Citrus
    if env_path.empty?
      @env_rejection_reason = "explicitly disabled (set to empty string)"
      return
    end

    # Store why env path was rejected for better error messages
    @env_rejection_reason = validate_env_path(env_path)

    # Principle of Least Surprise: If user explicitly sets an ENV variable
    # to a path, that path MUST work. Don't silently fall back to auto-discovery.
    if @env_rejection_reason
      raise TreeHaver::NotAvailable,
        "#{env_var_name} is set to #{env_path.inspect} but #{@env_rejection_reason}. " \
          "Either fix the path, unset the variable to use auto-discovery, " \
          "or set it to empty string to explicitly disable this grammar."
    end

    return env_path
  end

  # Search all paths (these are constructed from trusted base dirs)
  search_paths.find { |path| File.exist?(path) }
end

#find_library_path_safeString?

Find the grammar library path with strict security validation

This method only returns paths that are in trusted system directories.
Use this when you want maximum security and don’t need to support
custom installation locations.

Returns:

  • (String, nil)

    the path to the library, or nil if not found

See Also:

  • For the list of trusted directories


225
226
227
228
229
230
# File 'lib/tree_haver/grammar_finder.rb', line 225

def find_library_path_safe
  # Environment variable is NOT checked in safe mode - only trusted system paths
  search_paths.find do |path|
    File.exist?(path) && PathValidator.in_trusted_directory?(path)
  end
end

#library_filenameString

Get the library filename for the current platform

Returns:

  • (String)

    the library filename (e.g., “libtree-sitter-toml.so”)



107
108
109
110
# File 'lib/tree_haver/grammar_finder.rb', line 107

def library_filename
  ext = platform_extension
  "libtree-sitter-#{@language_name}#{ext}"
end

#not_found_messageString

Get a human-readable error message when library is not found

Returns:

  • (String)

    error message with installation hints



347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
# File 'lib/tree_haver/grammar_finder.rb', line 347

def not_found_message
  msg = "tree-sitter #{@language_name} grammar not found."

  # Check if env var is set but rejected
  env_value = ENV[env_var_name]
  msg += if env_value && @env_rejection_reason
    " #{env_var_name} is set to #{env_value.inspect} but #{@env_rejection_reason}."
  elsif env_value && File.exist?(env_value) && !self.class.tree_sitter_runtime_usable?
    " #{env_var_name} is set and file exists, but no tree-sitter runtime is available. " \
      "Add ruby_tree_sitter, ffi, or tree_stump gem to your Gemfile."
  elsif env_value
    " #{env_var_name} is set but was not used (file may have been removed)."
  else
    " Searched: #{search_paths.join(", ")}."
  end

  msg + " Install tree-sitter-#{@language_name} or set #{env_var_name} to a valid path."
end

#register!(raise_on_missing: false) ⇒ Boolean

Register this language with TreeHaver

After registration, the language can be loaded via dynamic method
(e.g., TreeHaver::Language.toml).

Parameters:

  • raise_on_missing (Boolean) (defaults to: false)

    if true, raises when library not found

Returns:

  • (Boolean)

    true if registration succeeded

Raises:

  • (NotAvailable)

    if library not found and raise_on_missing is true



313
314
315
316
317
318
319
320
321
322
323
324
# File 'lib/tree_haver/grammar_finder.rb', line 313

def register!(raise_on_missing: false)
  path = find_library_path
  unless path
    if raise_on_missing
      raise NotAvailable, not_found_message
    end
    return false
  end

  TreeHaver.register_language(@language_name, path: path, symbol: symbol_name)
  true
end

#search_infoHash

Get debug information about the search

Returns:

  • (Hash)

    diagnostic information



329
330
331
332
333
334
335
336
337
338
339
340
341
342
# File 'lib/tree_haver/grammar_finder.rb', line 329

def search_info
  found = find_library_path # This populates @env_rejection_reason
  {
    language: @language_name,
    env_var: env_var_name,
    env_value: ENV[env_var_name],
    env_rejection_reason: @env_rejection_reason,
    symbol: symbol_name,
    library_filename: library_filename,
    search_paths: search_paths,
    found_path: found,
    available: !found.nil?,
  }
end

#search_pathsArray<String>

Generate the full list of search paths for this language

Order: ENV override, extra_paths, user-local paths, then system paths

Returns:

  • (Array<String>)

    all paths to search



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/tree_haver/grammar_finder.rb', line 117

def search_paths
  paths = []

  # Extra paths provided at initialization (searched after ENV)
  @extra_paths.each do |dir|
    paths << File.join(dir, library_filename)
  end

  # User-local XDG paths (e.g. ~/.local/lib/tree-sitter/)
  user_search_dirs.each do |dir|
    paths << File.join(dir, library_filename)
  end

  # Common system paths with platform-appropriate extension
  BASE_SEARCH_DIRS.each do |dir|
    paths << File.join(dir, library_filename)
  end

  paths
end

#symbol_nameString

Get the expected symbol name exported by the grammar library

Returns:

  • (String)

    the symbol name (e.g., “tree_sitter_toml”)



100
101
102
# File 'lib/tree_haver/grammar_finder.rb', line 100

def symbol_name
  "tree_sitter_#{@language_name}"
end

#validate_env_path(path) ⇒ String?

Validate an environment variable path and return reason if invalid

Returns:

  • (String, nil)

    rejection reason or nil if valid



198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/tree_haver/grammar_finder.rb', line 198

def validate_env_path(path)
  # Check for leading/trailing whitespace
  if path != path.strip
    return "contains leading or trailing whitespace (use #{path.strip.inspect})"
  end

  # Check if path is safe
  unless PathValidator.safe_library_path?(path)
    return "failed security validation (may contain path traversal or suspicious characters)"
  end

  # Check if file exists
  unless File.exist?(path)
    return "file does not exist"
  end

  nil # Valid!
end