Files
foreignthon-docs/docs/dev/architecture.md

5.3 KiB

Architecture

This page describes how foreignthon-core works internally.


Pipeline

source.xx.py
     │
     ▼
_check_shebang()        ← reads "# foreignthon: xx" if present
     │
     ▼
load_pack(lang_code)    ← discovers + validates the JSON pack
     │
     ▼
_apply_postfix_syntax() ← rewrites "expr @@keyword:" lines
     │
     ▼
_swap_tokens()          ← tokenizer pass: replaces NAME tokens
     │
     ▼
standard Python string  ← ready to compile or write to disk

Module overview

Module Responsibility
transpiler.py The engine — postfix rewriter and tokenizer pass
pack.py Pack discovery, loading, and validation
cli.py Click commands (fpy run, fpy compile, etc.)
errors.py Bilingual exception hook
template.json Canonical set of all keywords/builtins a pack must cover

Tokenizer-based translation

ForeignThon uses Python's standard tokenize module rather than regex or AST manipulation.

tokenize.generate_tokens() splits source code into typed tokens. ForeignThon only looks at NAME tokens — identifiers. It replaces any NAME token whose string appears as a key in the active pack mapping. All other token types (strings, comments, operators, numbers) pass through unchanged.

This gives three important guarantees:

  1. Strings are safe. A keyword inside "..." or f"..." is a STRING token, never a NAME — it is never touched.
  2. Comments are safe. Comment tokens are passed through verbatim.
  3. Variable names are safe. A variable like si_condition contains si only as a substring; as a NAME token it is si_condition, which is not in the mapping.

The whitespace between tokens is preserved by tracking (row, col) positions and copying the gaps from the original source.


Pack discovery

Language packs register themselves using Python entry points:

# in foreignthon-es/pyproject.toml
[project.entry-points."foreignthon.langs"]
es = "foreignthon_es"

pack.py calls importlib.metadata.entry_points(group="foreignthon.langs") at runtime to discover all installed packs. Installing a pack is sufficient — no configuration file needs to be edited.

Each pack module must expose:

def get_pack_path() -> Path:
    return files(__name__) / "xx.json"

The core calls get_pack_path() to locate the JSON, loads it, and validates that all required sections are present.

Results are cached with @lru_cache so each pack is loaded at most once per process.


Pack mapping

Four sections of the JSON are merged into a single flat dict for translation:

mapping = {}
mapping.update(pack["keywords"])
mapping.update(pack["builtins"])
mapping.update(pack["exceptions"])
mapping.update(pack["stdlib"])

The merged mapping is { foreign_word: english_word }. It is passed directly to _swap_tokens().

If two sections define the same foreign key, later sections win (stdlib last). In practice this does not occur because pack authors ensure uniqueness.


Postfix syntax (@@)

The @@ operator is a source-level pre-processing step that runs before tokenization.

A line like:

x > 0 @@si:
    escribir(x)

is rewritten to:

si x > 0:
    escribir(x)

The rewriter uses a regex that matches (.+?)@@(<keyword>) and moves the keyword to the front. It only operates on lines that contain @@, preserving indentation and line endings.

@@ is never valid Python and never appears in the tokenizer output.

Decompile direction: fpy decompile --postfix does the reverse — it looks for lines of the form foreign_kw expr: where foreign_kw is in the pack's postfix_keywords list, and rewrites them to expr @@foreign_kw:.


Bilingual error hook

errors.py installs a custom sys.excepthook before running user code:

  1. On exception, it looks up the exception type name in the pack's exceptions section (reverse map: English → foreign).
  2. It looks up a translated message in error_messages.
  3. It prints [XX] ForeignName: translated_msg then [EN] EnglishName: original_msg.
  4. It always calls traceback.print_exception() afterwards so the full traceback is shown.

Tracebacks point to the original .xx.py file. This is achieved by populating linecache.cache with the original source before exec()-ing the compiled code, so Python's traceback machinery reads the right lines.


Custom pack override

When .foreignthon.toml declares custom_pack = "path/to/custom.json":

  • If the custom JSON has meta.code set, it is treated as a standalone pack and used directly.
  • If meta.code is absent, it is treated as an override — it is merged on top of the installed pack, replacing only the keys it defines.

The CLI (cli.py) handles this in _load_effective_pack() by walking up the directory tree to find .foreignthon.toml.


File naming and language detection

Language detection order (highest priority first):

  1. --lang CLI flag
  2. Shebang comment # foreignthon: xx on the first line
  3. Double extension .xx.pyxx
  4. Fallback to "en" (no-op — English is Python)

_detect_lang() and _check_shebang() in transpiler.py implement steps 3 and 2 respectively. Step 1 is handled by the run command in cli.py.