cteward-ng/MIGRATION_PLAN.md

227 lines
9.6 KiB
Markdown
Raw Normal View History

2026-06-06 12:04:59 +02:00
# Migration Plan: cteward-st-lexware (Node.js) → cteward-ng (Python/Flask)
## Framework Decision
**Chosen: Flask** — not Django.
| Criteria | Flask | Django |
|---|---|---|
| App type | Read-only REST API, no admin panel needed | Full-featured framework with ORM, admin, batteries included |
| Complexity | Lightweight, minimal boilerplate | Heavy, opinionated, unnecessary overhead |
| SQL Server access | `pyodbc` works cleanly | Django's MSSQL ORM support is third-party (`mssql-django`) and fragile |
| Existing pattern | Already using `pyodbc` directly in `main.py` | Would require Django models |
| Deployment | Docker-compatible, simple WSGI | Heavier deployment |
| Learning curve | Low (simple routing + middlewares) | High (models, views, templates, settings, etc.) |
The app is a thin read-only REST API wrapper around MSSQL queries. Flask is the right tool — you get routing, middleware, JSON responses, and extension points without ORM/admin overhead you'll never use.
---
## Architecture Overview of the Legacy App
```
HTTP Request (restify)
→ Auth Middleware (LDAP or bot password, Basic auth)
→ Permission Resolution (flag-based: _board_, _member_, _self_, etc.)
→ SQL Query Execution (mssql → MSSQL via connection pool)
→ Data Filter (e.g. active-only, self-only)
→ Data Mapping (raw DB columns → API response shape)
→ Renderer (JSON or CSV output)
→ HTTP Response
```
---
## Phase 0: Project Scaffolding ✅ DONE
Created `cteward-ng/cteward-ng/` with the following structure:
```
cteward-ng/
cteward-st-lexware/ # ← old, untouched
cteward_ng/
__init__.py
app.py # Flask app factory + middleware
config.py # Config loading (JSON, env vars)
auth.py # Basic auth + LDAP + bot auth (stubs)
permissions.py # Flag-based permission resolution (stubs)
database.py # pyodbc pool + all SQL query defs (stubs)
memberdata.py # realstatus(), datum(), patenarray() (full)
mappings.py # Raw DB → API response transformers (stubs)
filters.py # Active-only, self-only filters (full)
views.py # Route handlers (stubs)
renderers.py # JSON / CSV response helpers (full)
requirements.txt
pytest.ini
README.md
tests/
__init__.py
conftest.py # pytest fixtures
test_memberdata.py
test_config.py
```
---
## Phase 1: Infrastructure & Configuration ✅ DONE
- [x] **Config loading**: Done in `config.py` — ports the JSON config loading from `st-lexware-test.json` pattern (mssql creds, auth bots, LDAP, logging)
- [x] **Logging**: Replaced `bunyan` with Python's `logging` module + `BunyanFormatter` that produces JSON-structured output matching bunyan format (`name`, `hostname`, `pid`, `level`, `msg`, `time`, `v`)
- [x] **Docker**: Updated `Dockerfile` with Flask + dependencies (`pyodbc`, `ldap3`, `Flask`, `gunicorn`, `DBUtils`). Updated `podman-compose.yml` with proper environment variables, volumes, and restart policy.
---
## Phase 2: Database Layer
2026-06-06 22:21:20 +02:00
- [x] **Connection pool**: Ported `database.init()` from `mssql`/`tedious` to `pyodbc` + `DBUtils.PooledDB` with max=10 connections, immediate connectivity verification.
- [x] **Health check**: Ported `checkBackendOkay()` → verifies member count >= 7 and no duplicate crewnames.
- [x] **Query execution**: Ported `runquery()` with parameterized queries. All 14 SQL statements ported from T-SQL `@param` syntax to pyodbc `?` syntax:
- `QUERY_CONTRACTLIST_BY_CREWNAME`
2026-06-06 12:04:59 +02:00
- `QUERY_CONTRACT_BY_CREWNAME_AND_CONTRACT`
- `QUERY_DEBITLIST_BY_CREWNAME`
- `QUERY_DEBIT_BY_CREWNAME_AND_GUID`
- `QUERY_MEMBERLIST`
- `QUERY_MEMBERLIST_RAW`
- `QUERY_MEMBER_BY_CREWNAME`
- `QUERY_MEMBER_MEMO_BY_CREWNAME`
- `QUERY_WITHDRAWALLIST_BY_CREWNAME`
- `QUERY_WITHDRAWAL_BY_CREWNAME_AND_GUID`
- `QUERY_PAYMENTLIST_BY_CREWNAME`
2026-06-06 22:21:20 +02:00
- `QUERY_STATS_MEMBERS` (special) ✅
- `QUERY_STATS_CONTRACTS` (special) ✅
- `QUERY_STATS_GENDERS` (special) ✅
- `QUERY_STATS_AGES` (special, with step/min/max params) ✅
2026-06-06 12:04:59 +02:00
---
## Phase 3: Data Utilities
- [x] **Port `memberdata.js`**`memberdata.py`:
- [x] `realstatus()` — determine crew/passive/ex-crew/raumfahrer status
- [x] `datum()` — parse `YYYYMMDD` strings to German date format
- [x] `datum_parsed()` — parse ISO date strings
- [x] `patenarray()` / `cleanpaten()` — comma-separated name parsing
---
2026-06-08 20:33:47 +02:00
## Phase 4: Authentication & Authorization ✅ DONE
- [x] **Port `authprovider.js`**`auth.py`:
- [x] `check_password()` — plaintext + apr1 MD5 via `passlib.apr_md5_crypt`
- [x] `find_botuser()` — bot user lookup from config
- [x] `find_ldapuser()` — LDAP authentication via `ldap3`
- [x] Basic auth extraction + full pipeline in `authorize()`
- [x] **Port permission resolution**`permissions.py`:
- [x] `find_config_flags()` — flag assignment + impersonation-limited stripping
- [x] `find_database_flags()` — DB-based flags (_member_, _astronaut_, _passive_)
- [x] `impersonate()``?impersonate=` query param support
- [x] `effective_permissions()` — lowest level wins
2026-06-06 12:04:59 +02:00
---
2026-06-08 20:33:47 +02:00
## Phase 5: Filters & Mappings ✅ DONE
- [x] **Port `filters.js`**`filters.py`:
- [x] `MEMBERLIST_ACTIVE_ONLY` — filter to active members
- [x] `MEMBERLIST_SELF_ONLY` — filter to requesting user only
- [x] `runfilter()` — apply configured filter
- [x] **Port `mappings.js`**`mappings.py` (~380 lines):
- [x] `NONE`, `CONTRACT`, `CONTRACTLIST`
- [x] `DEBIT`, `DEBITLIST`
- [x] `CONTRIBUTIONS` — aggregated billed/paid/unpaid
- [x] `MEMBER`, `MEMO` (with RTF fallback parser)
- [x] `MEMBERLIST`, `MEMBERLIST_TO_LDAPCSV`
- [x] `WITHDRAWAL`, `WITHDRAWALLIST`
---
## Phase 6: API Routes ✅ DONE
All 11 endpoints implemented with full auth → query → filter → map → render pipeline:
- [x] `GET /legacy/monitor`
- [x] `GET /legacy/memberlist-oldformat` (CSV)
- [x] `GET /legacy/stats/members`, `/contracts`, `/genders`, `/ages`
- [x] `GET /legacy/member/<crewname>` (single or list based on ''/'*')
- [x] `GET /legacy/member/<crewname>/raw`
- [x] `GET /legacy/member/<crewname>/memo` (board-only)
- [x] `GET /legacy/member/<crewname>/contributions` (board-only)
- [x] `GET /legacy/member/<crewname>/<contract|debit|withdrawal|payment>/[<id>]/raw/`
2026-06-06 12:04:59 +02:00
---
2026-06-08 20:33:47 +02:00
## Phase 9: Tests ✅ 103 passing
- [x] Config tests (4) — loading, defaults, missing file, invalid JSON
- [x] Database tests (16) — init, connected, health check, query execution, member lookup
- [x] Memberdata tests (20) — realstatus, datum, patenarray, cleanpaten
- [x] Auth tests (21) — check_password, basic auth parsing, bot/LDAP auth, pipeline
- [x] Permissions tests (16) — flag resolution, self-detection, impersonation gating
- [x] Mappings tests (19) — all 12 mappers with realistic data shapes
- [x] Views integration tests (10) — monitor, stats, member, memo, contributions, detail raw
2026-06-06 12:04:59 +02:00
---
## Phase 7: Response Rendering
- [x] **Port `renderers.js`**`renderers.py`:
- [x] `JSON_OUTPUT` — JSON with 2-decimal float formatting + JSONP callback support
- [x] `CSV_OUTPUT` — semicolon-delimited CSV
---
## Phase 8: Middleware
- [x] **Port request middleware** (partially done in `app.py`):
- [x] Authorization header parsing + username extraction for logging
- [x] `WWW-Authenticate` header on unauthenticated requests
- [x] CORS / gzip (using `flask-compress` + `flask-cors`)
---
## Phase 9: Tests
- [ ] **Port Mocha tests** to `pytest`:
- [ ] `test/000-startup.js` → app startup + logging test
- [ ] `test/authprovider-*.js` → auth unit tests (6 files)
- [x] `test/memberdata_*.js` → memberdata unit tests (4 files merged into `test_memberdata.py`)
- [ ] `test/legacy_monitor.js` → health check integration test
- Use `pytest-fixtures` for DB mocking, `responses` or `requests-mock` for HTTP
---
## Phase 10: Validation & Cutover
- [ ] **API parity testing**: Hit every endpoint on both old and new with identical credentials; diff JSON responses byte-for-byte
- [ ] **Deployment**: Update `podman-compose.yml` to point to new Python service, test in staging, cutover
---
## Key Migration Notes
| Concern | Details |
|---|---|
| **RTF parsing** | `unrtf` (JS) → need Python equivalent. `rtfparse` or `extract-msg` may work. This is the riskiest conversion. |
| **LDAP** | `ldapauth-fork``ldap3`. `ldap3` is the standard Python LDAP library. |
| **Password hashing** | `apache-md5``passlib` for `apr1` MD5 crypt. |
| **Connection pooling** | Use `DBUtils.PooledDB` with `pyodbc` to match the `mssql` pool behavior. |
| **JSONP** | The callback parameter for JSONP is legacy but must be preserved. |
| **Config format** | Keep the same JSON config format so the deployment doesn't need reconfiguring. |
---
## Estimated Effort
| Phase | Complexity | Status |
|---|---|---|
| 0. Scaffolding | Trivial | ✅ Done |
2026-06-06 22:21:20 +02:00
| 1. Infrastructure | Low | ✅ Done (Dockerfile, podman-compose, BunyanFormatter) |
| 2. Database Layer | Medium | ✅ Done (PooledDB, all 14 queries + 4 stats aggregations) |
2026-06-06 12:04:59 +02:00
| 3. Data Utilities | Low | ✅ Done |
2026-06-08 20:33:47 +02:00
| 4. Auth & Permissions | Medium | ✅ Done (bot/LDAP auth, flag resolution, impersonation) |
| 5. Filters & Mappings | High (big file) | ✅ Done (all 12 mappers + 2 filters) |
| 6. API Routes | Medium | ✅ Done (all 11 endpoints with full auth→query→filter→map→render pipeline) |
2026-06-06 12:04:59 +02:00
| 7. Response Rendering | Low | ✅ Done |
| 8. Middleware | Low | ✅ Done (BunyanFormatter, WWW-Authenticate, CORS, gzip) |
2026-06-08 20:33:47 +02:00
| 9. Tests | High | ✅ 103 passing across config, database, memberdata, auth, permissions, mappings, views |
2026-06-06 12:04:59 +02:00
| 10. Validation | Medium | ⬜ Pending |