diff --git a/Json.md b/Json.md new file mode 100644 index 0000000..d3003b1 --- /dev/null +++ b/Json.md @@ -0,0 +1,40 @@ +# JSON Engine Comparison + +## Legend + +- **SAX**: Event/callback-based parsing. +- **DOM**: Full in-memory JSON representation. +- **Stream / incremental**: Can parse from a stream or in chunks without requiring the complete JSON input as one string. +- **JSON Schema**: Native JSON Schema validation support in the library itself. +- **Header-only**: Can be integrated mainly by adding headers, useful for git submodules and avoiding external link dependencies. +- **Not found**: No native support found in the official documentation/repository checked. + +## Parser Libraries + +| Library | GitHub | SAX / Event API | DOM API | Stream / Incremental API | JSON Schema Validation | Integration Type | Notes | +|---|---|---:|---:|---:|---:|---|---| +| RapidJSON | https://github.com/Tencent/rapidjson | Yes | Yes | Yes | Yes | Header-only | Supports both SAX and DOM. SAX `Reader` parses from a stream and publishes events to a handler. JSON Schema validation exists and can also work in SAX style while parsing. | +| nlohmann/json | https://github.com/nlohmann/json | Yes | Yes | Limited / input-stream based | Not found natively | Header-only / single-header | Has `json::sax_parse(...)` and a `json_sax` interface. Important: `sax_parse()` returns only `bool` and does not return a JSON value; the user must handle events manually. DOM parsing via `json::parse(...)` is the normal high-level use case. | +| YAJL | https://github.com/lloyd/yajl | Yes | Limited tree interface | Yes | No JSON Schema; validating generator only | Compiled C library | Event-driven SAX-style parser written in ANSI C. Supports stream/incremental parsing and generation. This is the existing external dependency style we may want to avoid. | +| json-c | https://github.com/json-c/json-c | No dedicated SAX found | Yes | Yes / tokener-based chunk parsing | Not found natively | Compiled C library | Provides a reference-counted object model. `json_tokener_parse_ex()` can parse buffers with explicit length and tokener state. | +| Jansson | https://github.com/akheron/jansson | No dedicated SAX found | Yes | Yes / callback input loading | Not found natively | Compiled C library | C library for encoding, decoding, and manipulating JSON. Has `json_load_callback()` to read JSON input repeatedly via callback, but the exposed model is still DOM-like `json_t`. | +| cJSON | https://github.com/DaveGamble/cJSON | No dedicated SAX found | Yes | No dedicated streaming parser found | Not found natively | Single `.c` + `.h` C library | Very small ANSI C parser. Easy to vendor, but not header-only and not SAX/streaming focused. | +| JsonCpp | https://github.com/open-source-parsers/jsoncpp | No dedicated SAX found | Yes | No dedicated streaming parser found | Not found natively | Compiled C++ library | C++ library for manipulating JSON values, including serialization/deserialization. Useful for DOM-style tests/config, less suitable for SAX/streaming requirements. | +| jsoncons | https://github.com/danielaparker/jsoncons | Yes | Yes | Yes / streaming-style APIs | Yes | Header-only | Feature-rich C++ library. Supports JSON-like data formats, DOM-style `basic_json`, streaming/event-style processing, and JSON Schema. | +| simdjson | https://github.com/simdjson/simdjson | No classical SAX API | Yes | Yes / On-Demand and parse-many APIs | Not found natively | Compiled library, also single-header distribution exists | Very high-performance parser. Has DOM and On-Demand APIs; On-Demand is lazy / just-in-time, not classical SAX callbacks. `parse_many` supports streams containing multiple JSON documents. | +| yyjson | https://github.com/ibireme/yyjson | No classical SAX API found | Yes | No dedicated streaming parser found | Not found natively | C library (`.h` + `.c`) | High-performance ANSI C library. Reading returns immutable documents/values; writing uses mutable documents/values. A SAX-like API was requested in an issue, which indicates it is not the normal API model. | +| Glaze | https://github.com/stephenberry/glaze | No classical SAX API found | Object / in-memory oriented | Not primary / not classical SAX streaming | Not found natively | Header-only C++ library | Modern C++ JSON/reflection library. Reads/writes from object memory. Good for typed C++ object serialization, but not primarily a SAX parser. Check project C++ version requirements before adopting. | + +## Validator Libraries + +| Library | GitHub | SAX / Event API | DOM API | Stream / Incremental API | JSON Schema Validation | Integration Type | Notes | +|---|---|---:|---:|---:|---:|---|---| +| Blaze | https://github.com/sourcemeta/blaze | No | No | Not a parser | Yes | Compiled C++ library | Dedicated high-performance JSON Schema validator. It should be considered a validation component, not a parser replacement. Use it after parsing JSON with another library. | + +## Architecture Notes + +- Blaze is not a JSON parser. +- If Blaze is used, the architecture should be: + +```text +[ JSON Parser ] -> [ JSON Schema Validator ] diff --git a/Json2.md b/Json2.md new file mode 100644 index 0000000..b354d76 --- /dev/null +++ b/Json2.md @@ -0,0 +1,210 @@ +# ModSecurity JSON/XML Processing – Security & Architecture Summary + +## 🧠 Overall Conclusion + +- XML and JSON are handled very differently in ModSecurity. +- This has **direct impact on memory usage and security**. +- A **JSON library alone is not sufficient for security**. +- A **dedicated control layer is required** to enforce limits and guarantee safe behavior. + +--- + +## 🔍 1. XML vs JSON Processing + +### XML + +- Uses **libxml2** +- Combines: + - **DOM (`xmlDoc`)** → full tree in memory + - **SAX** → for ARGS extraction + +**Issues:** +- ❌ No clearly enforced depth/node limits visible in analyzed code +- ❌ Potentially high memory usage due to DOM + +**Risk:** +> Full tree construction may lead to memory exhaustion or DoS scenarios. + +--- + +### JSON + +- Uses **YAJL (event-based / streaming parser)** +- No DOM construction in ModSecurity code +- BUT: + - Entire request body is buffered first + +**Resulting model:** + +Full request body in memory → streaming parsing on top + +--- + +## 🛡️ 2. Existing Limits + +### JSON – Implemented + +- ✅ `SecRequestBodyJsonDepthLimit` → limits nesting depth +- ✅ `SecRequestBodyLimit` → limits total body size +- ✅ `SecArgumentsLimit` → limits extracted parameters + +### JSON – Not Verified + +- ❌ Maximum array size +- ❌ Maximum number of keys +- ❌ Maximum string length +- ❌ JSON-specific memory limits + +--- + +### XML – More Problematic + +- ❌ No explicit depth limit visible +- ❌ DOM tree is constructed +- ⚠️ Limits may apply too late + +--- + +## ⚠️ 3. Key Insight + +> The most critical factor is **when limits are enforced**. + +GOOD: during parsing → early abort BAD: after parsing → too late + +--- + +## 🔄 4. Replacing YAJL (JSON Library Change) + +### Risks + +- Switching to DOM/tree-based parser: + - ❌ higher memory usage + - ❌ limits applied too late + - ❌ weaker DoS protection + +### Requirements for Replacement + +A new JSON library must: +- support **streaming/event-based parsing** +- allow **early abort during parsing** +- avoid building full JSON trees by default + +--- + +## 📊 5. JSON Library Evaluation + +### Recommended + +- **RapidJSON** + - SAX + optional DOM + - streaming support + - header-only + +- **jsoncons** + - feature-rich + - streaming + schema support + - more complex + +--- + +### Use with Caution + +- **nlohmann/json** + - easy to use + - ⚠️ default usage is DOM-based + +- **simdjson** + - very fast + - different (lazy/on-demand) model + - not a direct SAX replacement + +--- + +### Less Suitable + +- json-c +- Jansson +- cJSON +- JsonCpp +- yyjson +- Glaze + +→ primarily DOM/object-based + +--- + +## 🧩 6. Blaze (JSON Schema Validator) + +> Blaze is **not a parser**. + +### Suitable for: +- JSON Schema validation +- enforcing structure rules (types, required fields, etc.) + +### Not suitable for: +- streaming parsing +- early-abort enforcement +- replacing YAJL + +**Correct usage:** + +Parser → Control Layer → (optional) Schema Validator (Blaze) + +--- + +## 🏗️ 7. Required Architecture + +[ HTTP Body ] ↓ [ Streaming JSON Parser ] ↓ [ Control Layer (custom logic) ] ↓ [ ModSecurity (ARGS, rules) ] ↓ [ optional: Schema Validator ] + +--- + +## 🔥 8. Responsibilities of the Control Layer + +### Must enforce: + +- Maximum depth +- Number of keys / elements +- Array sizes +- String lengths +- Memory constraints + +--- + +### Critical requirement: Early Abort + +Parser → Callback → Control Logic → STOP immediately + +> Without early abort, protection is significantly weakened. + +--- + +### Additional tasks: + +- Controlled JSON → ARGS mapping +- Prevent parameter explosion +- Maintain consistent WAF behavior + +--- + +## ❌ 9. What to Avoid + +- Replacing parser without architecture changes +- Using DOM-based parsing without strict limits +- Applying limits only after full parsing +- Assuming the library handles security + +--- + +## 📌 Final Conclusion + +- XML is currently **more risky** due to DOM usage and missing limits. +- JSON is **better protected**, but still incomplete. +- **Security does not come from the library.** +- **Security comes from a control layer enforcing limits during parsing.** + +--- + +## 🧠 Key Takeaway + +> A JSON library is just a parser. +> **Security only exists if a control layer enforces limits and guarantees early abort behavior.**