161 lines
5.3 KiB
Markdown
161 lines
5.3 KiB
Markdown
# Architecture
|
|
|
|
This page describes how `foreignthon-core` works internally.
|
|
|
|
---
|
|
|
|
## Pipeline
|
|
|
|
```
|
|
source.xx.py
|
|
│
|
|
▼
|
|
_check_shebang() ← reads "# foreignthon: xx" if present
|
|
│
|
|
▼
|
|
load_pack(lang_code) ← discovers + validates the JSON pack
|
|
│
|
|
▼
|
|
_apply_postfix_syntax() ← rewrites "expr @@keyword:" lines
|
|
│
|
|
▼
|
|
_swap_tokens() ← tokenizer pass: replaces NAME tokens
|
|
│
|
|
▼
|
|
standard Python string ← ready to compile or write to disk
|
|
```
|
|
|
|
---
|
|
|
|
## Module overview
|
|
|
|
| Module | Responsibility |
|
|
|---|---|
|
|
| `transpiler.py` | The engine — postfix rewriter and tokenizer pass |
|
|
| `pack.py` | Pack discovery, loading, and validation |
|
|
| `cli.py` | Click commands (`fpy run`, `fpy compile`, etc.) |
|
|
| `errors.py` | Bilingual exception hook |
|
|
| `template.json` | Canonical set of all keywords/builtins a pack must cover |
|
|
|
|
---
|
|
|
|
## Tokenizer-based translation
|
|
|
|
ForeignThon uses Python's standard `tokenize` module rather than regex or AST manipulation.
|
|
|
|
`tokenize.generate_tokens()` splits source code into typed tokens. ForeignThon only looks at `NAME` tokens — identifiers. It replaces any `NAME` token whose string appears as a key in the active pack mapping. All other token types (strings, comments, operators, numbers) pass through unchanged.
|
|
|
|
This gives three important guarantees:
|
|
|
|
1. **Strings are safe.** A keyword inside `"..."` or `f"..."` is a `STRING` token, never a `NAME` — it is never touched.
|
|
2. **Comments are safe.** Comment tokens are passed through verbatim.
|
|
3. **Variable names are safe.** A variable like `si_condition` contains `si` only as a substring; as a `NAME` token it is `si_condition`, which is not in the mapping.
|
|
|
|
The whitespace between tokens is preserved by tracking `(row, col)` positions and copying the gaps from the original source.
|
|
|
|
---
|
|
|
|
## Pack discovery
|
|
|
|
Language packs register themselves using Python [entry points](https://packaging.python.org/en/latest/specifications/entry-points/):
|
|
|
|
```toml
|
|
# in foreignthon-es/pyproject.toml
|
|
[project.entry-points."foreignthon.langs"]
|
|
es = "foreignthon_es"
|
|
```
|
|
|
|
`pack.py` calls `importlib.metadata.entry_points(group="foreignthon.langs")` at runtime to discover all installed packs. Installing a pack is sufficient — no configuration file needs to be edited.
|
|
|
|
Each pack module must expose:
|
|
|
|
```python
|
|
def get_pack_path() -> Path:
|
|
return files(__name__) / "xx.json"
|
|
```
|
|
|
|
The core calls `get_pack_path()` to locate the JSON, loads it, and validates that all required sections are present.
|
|
|
|
Results are cached with `@lru_cache` so each pack is loaded at most once per process.
|
|
|
|
---
|
|
|
|
## Pack mapping
|
|
|
|
Four sections of the JSON are merged into a single flat dict for translation:
|
|
|
|
```python
|
|
mapping = {}
|
|
mapping.update(pack["keywords"])
|
|
mapping.update(pack["builtins"])
|
|
mapping.update(pack["exceptions"])
|
|
mapping.update(pack["stdlib"])
|
|
```
|
|
|
|
The merged mapping is `{ foreign_word: english_word }`. It is passed directly to `_swap_tokens()`.
|
|
|
|
If two sections define the same foreign key, later sections win (stdlib last). In practice this does not occur because pack authors ensure uniqueness.
|
|
|
|
---
|
|
|
|
## Postfix syntax (`@@`)
|
|
|
|
The `@@` operator is a source-level pre-processing step that runs **before** tokenization.
|
|
|
|
A line like:
|
|
|
|
```
|
|
x > 0 @@si:
|
|
escribir(x)
|
|
```
|
|
|
|
is rewritten to:
|
|
|
|
```
|
|
si x > 0:
|
|
escribir(x)
|
|
```
|
|
|
|
The rewriter uses a regex that matches `(.+?)@@(<keyword>)` and moves the keyword to the front. It only operates on lines that contain `@@`, preserving indentation and line endings.
|
|
|
|
`@@` is never valid Python and never appears in the tokenizer output.
|
|
|
|
**Decompile direction:** `fpy decompile --postfix` does the reverse — it looks for lines of the form `foreign_kw expr:` where `foreign_kw` is in the pack's `postfix_keywords` list, and rewrites them to `expr @@foreign_kw:`.
|
|
|
|
---
|
|
|
|
## Bilingual error hook
|
|
|
|
`errors.py` installs a custom `sys.excepthook` before running user code:
|
|
|
|
1. On exception, it looks up the exception type name in the pack's `exceptions` section (reverse map: English → foreign).
|
|
2. It looks up a translated message in `error_messages`.
|
|
3. It prints `[XX] ForeignName: translated_msg` then `[EN] EnglishName: original_msg`.
|
|
4. It always calls `traceback.print_exception()` afterwards so the full traceback is shown.
|
|
|
|
Tracebacks point to the original `.xx.py` file. This is achieved by populating `linecache.cache` with the original source before `exec()`-ing the compiled code, so Python's traceback machinery reads the right lines.
|
|
|
|
---
|
|
|
|
## Custom pack override
|
|
|
|
When `.foreignthon.toml` declares `custom_pack = "path/to/custom.json"`:
|
|
|
|
- If the custom JSON has `meta.code` set, it is treated as a **standalone pack** and used directly.
|
|
- If `meta.code` is absent, it is treated as an **override** — it is merged on top of the installed pack, replacing only the keys it defines.
|
|
|
|
The CLI (`cli.py`) handles this in `_load_effective_pack()` by walking up the directory tree to find `.foreignthon.toml`.
|
|
|
|
---
|
|
|
|
## File naming and language detection
|
|
|
|
Language detection order (highest priority first):
|
|
|
|
1. `--lang` CLI flag
|
|
2. Shebang comment `# foreignthon: xx` on the first line
|
|
3. Double extension `.xx.py` → `xx`
|
|
4. Fallback to `"en"` (no-op — English is Python)
|
|
|
|
`_detect_lang()` and `_check_shebang()` in `transpiler.py` implement steps 3 and 2 respectively. Step 1 is handled by the `run` command in `cli.py`.
|