Files
foreignthon-docs/docs/dev/architecture.md

161 lines
5.3 KiB
Markdown

# Architecture
This page describes how `foreignthon-core` works internally.
---
## Pipeline
```
source.xx.py
_check_shebang() ← reads "# foreignthon: xx" if present
load_pack(lang_code) ← discovers + validates the JSON pack
_apply_postfix_syntax() ← rewrites "expr @@keyword:" lines
_swap_tokens() ← tokenizer pass: replaces NAME tokens
standard Python string ← ready to compile or write to disk
```
---
## Module overview
| Module | Responsibility |
|---|---|
| `transpiler.py` | The engine — postfix rewriter and tokenizer pass |
| `pack.py` | Pack discovery, loading, and validation |
| `cli.py` | Click commands (`fpy run`, `fpy compile`, etc.) |
| `errors.py` | Bilingual exception hook |
| `template.json` | Canonical set of all keywords/builtins a pack must cover |
---
## Tokenizer-based translation
ForeignThon uses Python's standard `tokenize` module rather than regex or AST manipulation.
`tokenize.generate_tokens()` splits source code into typed tokens. ForeignThon only looks at `NAME` tokens — identifiers. It replaces any `NAME` token whose string appears as a key in the active pack mapping. All other token types (strings, comments, operators, numbers) pass through unchanged.
This gives three important guarantees:
1. **Strings are safe.** A keyword inside `"..."` or `f"..."` is a `STRING` token, never a `NAME` — it is never touched.
2. **Comments are safe.** Comment tokens are passed through verbatim.
3. **Variable names are safe.** A variable like `si_condition` contains `si` only as a substring; as a `NAME` token it is `si_condition`, which is not in the mapping.
The whitespace between tokens is preserved by tracking `(row, col)` positions and copying the gaps from the original source.
---
## Pack discovery
Language packs register themselves using Python [entry points](https://packaging.python.org/en/latest/specifications/entry-points/):
```toml
# in foreignthon-es/pyproject.toml
[project.entry-points."foreignthon.langs"]
es = "foreignthon_es"
```
`pack.py` calls `importlib.metadata.entry_points(group="foreignthon.langs")` at runtime to discover all installed packs. Installing a pack is sufficient — no configuration file needs to be edited.
Each pack module must expose:
```python
def get_pack_path() -> Path:
return files(__name__) / "xx.json"
```
The core calls `get_pack_path()` to locate the JSON, loads it, and validates that all required sections are present.
Results are cached with `@lru_cache` so each pack is loaded at most once per process.
---
## Pack mapping
Four sections of the JSON are merged into a single flat dict for translation:
```python
mapping = {}
mapping.update(pack["keywords"])
mapping.update(pack["builtins"])
mapping.update(pack["exceptions"])
mapping.update(pack["stdlib"])
```
The merged mapping is `{ foreign_word: english_word }`. It is passed directly to `_swap_tokens()`.
If two sections define the same foreign key, later sections win (stdlib last). In practice this does not occur because pack authors ensure uniqueness.
---
## Postfix syntax (`@@`)
The `@@` operator is a source-level pre-processing step that runs **before** tokenization.
A line like:
```
x > 0 @@si:
escribir(x)
```
is rewritten to:
```
si x > 0:
escribir(x)
```
The rewriter uses a regex that matches `(.+?)@@(<keyword>)` and moves the keyword to the front. It only operates on lines that contain `@@`, preserving indentation and line endings.
`@@` is never valid Python and never appears in the tokenizer output.
**Decompile direction:** `fpy decompile --postfix` does the reverse — it looks for lines of the form `foreign_kw expr:` where `foreign_kw` is in the pack's `postfix_keywords` list, and rewrites them to `expr @@foreign_kw:`.
---
## Bilingual error hook
`errors.py` installs a custom `sys.excepthook` before running user code:
1. On exception, it looks up the exception type name in the pack's `exceptions` section (reverse map: English → foreign).
2. It looks up a translated message in `error_messages`.
3. It prints `[XX] ForeignName: translated_msg` then `[EN] EnglishName: original_msg`.
4. It always calls `traceback.print_exception()` afterwards so the full traceback is shown.
Tracebacks point to the original `.xx.py` file. This is achieved by populating `linecache.cache` with the original source before `exec()`-ing the compiled code, so Python's traceback machinery reads the right lines.
---
## Custom pack override
When `.foreignthon.toml` declares `custom_pack = "path/to/custom.json"`:
- If the custom JSON has `meta.code` set, it is treated as a **standalone pack** and used directly.
- If `meta.code` is absent, it is treated as an **override** — it is merged on top of the installed pack, replacing only the keys it defines.
The CLI (`cli.py`) handles this in `_load_effective_pack()` by walking up the directory tree to find `.foreignthon.toml`.
---
## File naming and language detection
Language detection order (highest priority first):
1. `--lang` CLI flag
2. Shebang comment `# foreignthon: xx` on the first line
3. Double extension `.xx.py``xx`
4. Fallback to `"en"` (no-op — English is Python)
`_detect_lang()` and `_check_shebang()` in `transpiler.py` implement steps 3 and 2 respectively. Step 1 is handled by the `run` command in `cli.py`.