TY - JOUR
T1 - Detecting command injection vulnerabilities in Linux-based embedded firmware with LLM-based taint analysis of library functions
AU - Ye, Junjian
AU - Fei, Xincheng
AU - de Carnavalet, Xavier de Carné
AU - Zhao, Lianying
AU - Wu, Lifa
AU - Zhang, Mengyuan
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/9
Y1 - 2024/9
N2 - With the popularization of IoT devices, embedded firmware security has attracted people's attention. Command injection (CI) is one of the most common types of vulnerabilities in Linux-based embedded firmware. It is caused by user input being propagated to functions responsible for command execution without strict sanitization, which can be detected by static taint analysis. Unfortunately, single-binary taint analysis tools cannot find vulnerabilities caused by custom dynamically linked library functions (DLLFs) that are implemented in external library files, while multi-binary analysis tools are time-consuming. In this paper, we present SLFHunter, an approach that leverages Large Language Model (LLM) to analyze sensitive custom DLLFs separately, and imports their information into single-binary taint analysis tools to overcome this challenge. Our approach follows filtering rules to find out sensitive DLLFs that call common sink functions, and analyzes them with LLMs to find sink library functions (SLFs) where input parameters can be passed to executed command strings. Finally, SLFs are marked as new sinks to help existing tools discover CI vulnerabilities caused by them. We implemented SLFHunter as a ChatGPT-based module for EmTaint and evaluated it with a dataset consisting of 100 Linux-based embedded firmware samples from 13 vendors. The results show that our prompts can guide ChatGPT 4.0 to identify SLFs with 95% accuracy after being improved with a trick we dubbed “double-check”. SLFHunter can help EmTaint find 42 additional CI vulnerabilities with an average time cost increase of 89 s on our dataset, which demonstrates the effectiveness and efficiency of our approach.
AB - With the popularization of IoT devices, embedded firmware security has attracted people's attention. Command injection (CI) is one of the most common types of vulnerabilities in Linux-based embedded firmware. It is caused by user input being propagated to functions responsible for command execution without strict sanitization, which can be detected by static taint analysis. Unfortunately, single-binary taint analysis tools cannot find vulnerabilities caused by custom dynamically linked library functions (DLLFs) that are implemented in external library files, while multi-binary analysis tools are time-consuming. In this paper, we present SLFHunter, an approach that leverages Large Language Model (LLM) to analyze sensitive custom DLLFs separately, and imports their information into single-binary taint analysis tools to overcome this challenge. Our approach follows filtering rules to find out sensitive DLLFs that call common sink functions, and analyzes them with LLMs to find sink library functions (SLFs) where input parameters can be passed to executed command strings. Finally, SLFs are marked as new sinks to help existing tools discover CI vulnerabilities caused by them. We implemented SLFHunter as a ChatGPT-based module for EmTaint and evaluated it with a dataset consisting of 100 Linux-based embedded firmware samples from 13 vendors. The results show that our prompts can guide ChatGPT 4.0 to identify SLFs with 95% accuracy after being improved with a trick we dubbed “double-check”. SLFHunter can help EmTaint find 42 additional CI vulnerabilities with an average time cost increase of 89 s on our dataset, which demonstrates the effectiveness and efficiency of our approach.
KW - Command injection
KW - Embedded firmware security
KW - Large Language Model
KW - Static taint analysis
UR - http://www.scopus.com/inward/record.url?scp=85197596396&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197596396&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2024.103971
DO - 10.1016/j.cose.2024.103971
M3 - Article
AN - SCOPUS:85197596396
SN - 0167-4048
VL - 144
SP - 1
EP - 12
JO - Computers and Security
JF - Computers and Security
M1 - 103971
ER -