Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions Json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# JSON Engine Comparison

## Legend

- **SAX**: Event/callback-based parsing.
- **DOM**: Full in-memory JSON representation.
- **Stream / incremental**: Can parse from a stream or in chunks without requiring the complete JSON input as one string.
- **JSON Schema**: Native JSON Schema validation support in the library itself.
- **Header-only**: Can be integrated mainly by adding headers, useful for git submodules and avoiding external link dependencies.
- **Not found**: No native support found in the official documentation/repository checked.

## Parser Libraries

| Library | GitHub | SAX / Event API | DOM API | Stream / Incremental API | JSON Schema Validation | Integration Type | Notes |
|---|---|---:|---:|---:|---:|---|---|
| RapidJSON | https://github.com/Tencent/rapidjson | Yes | Yes | Yes | Yes | Header-only | Supports both SAX and DOM. SAX `Reader` parses from a stream and publishes events to a handler. JSON Schema validation exists and can also work in SAX style while parsing. |
| nlohmann/json | https://github.com/nlohmann/json | Yes | Yes | Limited / input-stream based | Not found natively | Header-only / single-header | Has `json::sax_parse(...)` and a `json_sax` interface. Important: `sax_parse()` returns only `bool` and does not return a JSON value; the user must handle events manually. DOM parsing via `json::parse(...)` is the normal high-level use case. |
| YAJL | https://github.com/lloyd/yajl | Yes | Limited tree interface | Yes | No JSON Schema; validating generator only | Compiled C library | Event-driven SAX-style parser written in ANSI C. Supports stream/incremental parsing and generation. This is the existing external dependency style we may want to avoid. |
| json-c | https://github.com/json-c/json-c | No dedicated SAX found | Yes | Yes / tokener-based chunk parsing | Not found natively | Compiled C library | Provides a reference-counted object model. `json_tokener_parse_ex()` can parse buffers with explicit length and tokener state. |
| Jansson | https://github.com/akheron/jansson | No dedicated SAX found | Yes | Yes / callback input loading | Not found natively | Compiled C library | C library for encoding, decoding, and manipulating JSON. Has `json_load_callback()` to read JSON input repeatedly via callback, but the exposed model is still DOM-like `json_t`. |
| cJSON | https://github.com/DaveGamble/cJSON | No dedicated SAX found | Yes | No dedicated streaming parser found | Not found natively | Single `.c` + `.h` C library | Very small ANSI C parser. Easy to vendor, but not header-only and not SAX/streaming focused. |
| JsonCpp | https://github.com/open-source-parsers/jsoncpp | No dedicated SAX found | Yes | No dedicated streaming parser found | Not found natively | Compiled C++ library | C++ library for manipulating JSON values, including serialization/deserialization. Useful for DOM-style tests/config, less suitable for SAX/streaming requirements. |
| jsoncons | https://github.com/danielaparker/jsoncons | Yes | Yes | Yes / streaming-style APIs | Yes | Header-only | Feature-rich C++ library. Supports JSON-like data formats, DOM-style `basic_json`, streaming/event-style processing, and JSON Schema. |
| simdjson | https://github.com/simdjson/simdjson | No classical SAX API | Yes | Yes / On-Demand and parse-many APIs | Not found natively | Compiled library, also single-header distribution exists | Very high-performance parser. Has DOM and On-Demand APIs; On-Demand is lazy / just-in-time, not classical SAX callbacks. `parse_many` supports streams containing multiple JSON documents. |
| yyjson | https://github.com/ibireme/yyjson | No classical SAX API found | Yes | No dedicated streaming parser found | Not found natively | C library (`.h` + `.c`) | High-performance ANSI C library. Reading returns immutable documents/values; writing uses mutable documents/values. A SAX-like API was requested in an issue, which indicates it is not the normal API model. |
| Glaze | https://github.com/stephenberry/glaze | No classical SAX API found | Object / in-memory oriented | Not primary / not classical SAX streaming | Not found natively | Header-only C++ library | Modern C++ JSON/reflection library. Reads/writes from object memory. Good for typed C++ object serialization, but not primarily a SAX parser. Check project C++ version requirements before adopting. |

## Validator Libraries

| Library | GitHub | SAX / Event API | DOM API | Stream / Incremental API | JSON Schema Validation | Integration Type | Notes |
|---|---|---:|---:|---:|---:|---|---|
| Blaze | https://github.com/sourcemeta/blaze | No | No | Not a parser | Yes | Compiled C++ library | Dedicated high-performance JSON Schema validator. It should be considered a validation component, not a parser replacement. Use it after parsing JSON with another library. |

## Architecture Notes

- Blaze is not a JSON parser.
- If Blaze is used, the architecture should be:

```text
[ JSON Parser ] -> [ JSON Schema Validator ]
210 changes: 210 additions & 0 deletions Json2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# ModSecurity JSON/XML Processing – Security & Architecture Summary

## 🧠 Overall Conclusion

- XML and JSON are handled very differently in ModSecurity.
- This has **direct impact on memory usage and security**.
- A **JSON library alone is not sufficient for security**.
- A **dedicated control layer is required** to enforce limits and guarantee safe behavior.

---

## 🔍 1. XML vs JSON Processing

### XML

- Uses **libxml2**
- Combines:
- **DOM (`xmlDoc`)** → full tree in memory
- **SAX** → for ARGS extraction

**Issues:**
- ❌ No clearly enforced depth/node limits visible in analyzed code
- ❌ Potentially high memory usage due to DOM

**Risk:**
> Full tree construction may lead to memory exhaustion or DoS scenarios.

---

### JSON

- Uses **YAJL (event-based / streaming parser)**
- No DOM construction in ModSecurity code
- BUT:
- Entire request body is buffered first

**Resulting model:**

Full request body in memory → streaming parsing on top

---

## 🛡️ 2. Existing Limits

### JSON – Implemented

- ✅ `SecRequestBodyJsonDepthLimit` → limits nesting depth
- ✅ `SecRequestBodyLimit` → limits total body size
- ✅ `SecArgumentsLimit` → limits extracted parameters

### JSON – Not Verified

- ❌ Maximum array size
- ❌ Maximum number of keys
- ❌ Maximum string length
- ❌ JSON-specific memory limits

---

### XML – More Problematic

- ❌ No explicit depth limit visible
- ❌ DOM tree is constructed
- ⚠️ Limits may apply too late

---

## ⚠️ 3. Key Insight

> The most critical factor is **when limits are enforced**.

GOOD: during parsing → early abort BAD: after parsing → too late

---

## 🔄 4. Replacing YAJL (JSON Library Change)

### Risks

- Switching to DOM/tree-based parser:
- ❌ higher memory usage
- ❌ limits applied too late
- ❌ weaker DoS protection

### Requirements for Replacement

A new JSON library must:
- support **streaming/event-based parsing**
- allow **early abort during parsing**
- avoid building full JSON trees by default

---

## 📊 5. JSON Library Evaluation

### Recommended

- **RapidJSON**
- SAX + optional DOM
- streaming support
- header-only

- **jsoncons**
- feature-rich
- streaming + schema support
- more complex

---

### Use with Caution

- **nlohmann/json**
- easy to use
- ⚠️ default usage is DOM-based

- **simdjson**
- very fast
- different (lazy/on-demand) model
- not a direct SAX replacement

---

### Less Suitable

- json-c
- Jansson
- cJSON
- JsonCpp
- yyjson
- Glaze

→ primarily DOM/object-based

---

## 🧩 6. Blaze (JSON Schema Validator)

> Blaze is **not a parser**.

### Suitable for:
- JSON Schema validation
- enforcing structure rules (types, required fields, etc.)

### Not suitable for:
- streaming parsing
- early-abort enforcement
- replacing YAJL

**Correct usage:**

Parser → Control Layer → (optional) Schema Validator (Blaze)

---

## 🏗️ 7. Required Architecture

[ HTTP Body ] ↓ [ Streaming JSON Parser ] ↓ [ Control Layer (custom logic) ] ↓ [ ModSecurity (ARGS, rules) ] ↓ [ optional: Schema Validator ]

---

## 🔥 8. Responsibilities of the Control Layer

### Must enforce:

- Maximum depth
- Number of keys / elements
- Array sizes
- String lengths
- Memory constraints

---

### Critical requirement: Early Abort

Parser → Callback → Control Logic → STOP immediately

> Without early abort, protection is significantly weakened.

---

### Additional tasks:

- Controlled JSON → ARGS mapping
- Prevent parameter explosion
- Maintain consistent WAF behavior

---

## ❌ 9. What to Avoid

- Replacing parser without architecture changes
- Using DOM-based parsing without strict limits
- Applying limits only after full parsing
- Assuming the library handles security

---

## 📌 Final Conclusion

- XML is currently **more risky** due to DOM usage and missing limits.
- JSON is **better protected**, but still incomplete.
- **Security does not come from the library.**
- **Security comes from a control layer enforcing limits during parsing.**

---

## 🧠 Key Takeaway

> A JSON library is just a parser.
> **Security only exists if a control layer enforces limits and guarantees early abort behavior.**