# Description for CVE-2022-44311

html2xhtml v1.3 was discovered to contain an Out-Of-Bounds read in the function static void elm_close(tree_node_t *nodo) at procesador.c. This vulnerability allows attackers to access sensitive files or cause a Denial of Service (DoS) via a crafted html file.

# Reproduction

To reproduce the vulnerability, download a vulnerable version of html2xhtml (v1.3) and compile the project:

tar -xzvf html2xhtml-1.3.tar.gz
cd html2xhtml-1.3
cd src

Once the project has been compiled, we can point html2xhtml towards our proof of concept file included in this repository (CVE-2022-44311_crash):

./html2xhtml -t frameset ./CVE-2022-44311_crash

The previous command will produce a crash and return an error message:

zsh: segmentation fault  ./src/html2xhtml -t frameset ./CVE-2022-44311_crash

Attaching valgrind to the program can help us understand what is causing the crash:

โ””โ”€$ valgrind ./src/html2xhtml -t frameset ./CVE-2022-44311_crash
==267753== Memcheck, a memory error detector
==267753== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==267753== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==267753== Command: ./src/html2xhtml -t frameset ./CVE-2022-44311_crash
==267753== Invalid read of size 4
==267753==    at 0x11B18A: elm_close (procesador.c:944)
==267753==    by 0x11B18A: err_html_struct (procesador.c:1889)
==267753==    by 0x11BBB5: err_content_invalid (procesador.c:1291)
==267753==    by 0x11BBB5: elm_close.part.0 (procesador.c:959)
==267753==    by 0x11C4C0: elm_close (procesador.c:944)
==267753==    by 0x11C4C0: saxEndDocument (procesador.c:233)
==267753==    by 0x1144AE: main (html2xhtml.c:117)
==267753==  Address 0x3ec404 is not stack'd, malloc'd or (recently) free'd


Valgrind tells us that an out-of-bounds read of size 4 is taking place in procesador.c, line 944. Attaching gdb to our program and executing the malicious file can confirm the valgrind output:

$ gdb src/html2xhtml
pwndbg> r -t frameset ./CVE-2022-44311_crash

LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA                                                     
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ REGISTERS ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 RAX  0xb11ae                                                                                         
 RBX  0x5555555dd344 โ—‚โ€” 0x3                                                                                                                                                                                  
 RCX  0x5a
 RDX  0x2                                                                                             
 RDI  0x555555573d40 (elm_list) โ—‚โ€” 0x6c6d7468 /* 'html' */
 RSI  0x555555573160 (elm_buffer) โ—‚โ€” 0xd9810100028b8101                                                                                                                                                      
 R8   0x1
 R9   0x5555555ee520 โ—‚โ€” 0x5555555ee                                                                   
 R10  0x0                                                                                                                                                                                                    
 R11  0x7ffff7df2800 (iconv_close) โ—‚โ€” cmp    rdi, -1
 R12  0x5555555dd2d6 โ—‚โ€” 0x0                                                                           
 R13  0x7ffffffedc70 โ—‚โ€” 0x600000001                                                                   
 R14  0x5555555dd2d6 โ—‚โ€” 0x0                                                                           
 R15  0x4
 RBP  0x555555573d40 (elm_list) โ—‚โ€” 0x6c6d7468 /* 'html' */                    
 RSP  0x7ffffffedc30 โ—‚โ€” 0x1                                                                                                                                                                                  
 RIP  0x55555556718a (err_html_struct+474) โ—‚โ€” cmp    dword ptr [rbp + rax*4 + 0xc], 4
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ DISASM ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 โ–บ 0x55555556718a <err_html_struct+474>    cmp    dword ptr [rbp + rax*4 + 0xc], 4
   0x55555556718f <err_html_struct+479>    jne    err_html_struct+489                <err_html_struct+489>
   0x555555567199 <err_html_struct+489>    mov    rbx, qword ptr [rbx + 8]
   0x55555556719d <err_html_struct+493>    test   rbx, rbx           
   0x5555555671a0 <err_html_struct+496>    jne    err_html_struct+448                <err_html_struct+448>
   0x555555567170 <err_html_struct+448>    cmp    r12, rbx
   0x555555567173 <err_html_struct+451>    je     err_html_struct+498                <err_html_struct+498>
   0x5555555671a2 <err_html_struct+498>    xor    edi, edi                                                                                                                                                   
   0x5555555671a4 <err_html_struct+500>    mov    qword ptr [rip + 0x4d6d5], r12 <actual_element>
   0x5555555671ab <err_html_struct+507>    call   new_tree_node                <new_tree_node>
   0x5555555671b0 <err_html_struct+512>    mov    dword ptr [rax + 0x18], 0x59
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€[ SOURCE (CODE) ]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
In file: /dev/shm/html2xhtml-1.3/src/procesador.c                                                     
   939 static void elm_close(tree_node_t *nodo)
   940 {                          
   941   DEBUG("elm_close()");                                                                                                                                                                               
   942   EPRINTF1("cerrando elemento %s\n",ELM_PTR(nodo).name);
 โ–บ 944   if (ELM_PTR(nodo).contenttype[doctype]==CONTTYPE_CHILDREN) {
   945     /* si es de tipo child se comprueba su contenido */
   946     int content[16384];           
   947     int i, num;                   
   948     tree_node_t *elm;    

GDB confirmed that the program is attempting to read from an invalid memory address when executing the following lines of source code:
 โ–บ 944   if (ELM_PTR(nodo).contenttype[doctype]==CONTTYPE_CHILDREN) {
   945     /* si es de tipo child se comprueba su contenido */
   946     int content[16384];           
   947     int i, num;                   
   948     tree_node_t *elm; 

# References