PyTorch suffers supply chain attack

PyTorch Users who deployed the nightly builds of PyTorch between Christmas and New Year’s Eve likely received a rogue package as part of the installation that siphoned off sensitive data from their systems. The incident was the result of an attack called dependency confusion that continues to impact package managers and development environments if hardening steps are not taken. The PyTorch maintainers said in a security advisory.

PyTorch is a framework for developing machine learning applications in the fields of computer vision and natural language processing that is a continuation of the older and no longer maintained Torch library. PyTorch was originally developed by Meta AI, the artificial intelligence laboratory of Meta, Inc., but is now an open-source project maintained by the PyTorch Foundation under the Linux Foundation’s umbrella.

The most Python programs, PyTorch can be installed via pip, a package management tool and installer that uses the public PyPi (Python Package Index) as its main repository. However, like most package management tools, pip allows users to define additional repositories, a feature commonly used by organizations to host internally developed components that are used in their applications and are not meant for public release.

PyTorch’s dependency chain — additional packages that are downloaded during its installation — includes a library called torchtriton that was hosted on PyTorch’s own index for nightly builds. Until December 25, there was no torchtriton library on PyPi, so pip looked for it on PyTorch’s alternate repository.

However, an attacker decided to register the torchtriton package name on PyPi and upload a malicious package, which in turn tricked the installation routine for PyTorch’s nightly builds to download the rogue version from PyPi. The PyTorch stable builds were not affected.

“Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository,” the PyTorch maintainers said. “This design enables somebody to register a package by the same name as one that exists in a third-party index, and pip will install their version by default.”

The malicious torchtriton package was designed to collect information about the system, such as used nameservers, computer hostname, current username, working directory and environment variables. It also read the contents of /etc/hosts (internally defined hosts), /etc/passwd (local users list), files in the user’s home directory, .gitconfig directory and .ssh directory which includes SSH keys. All this information was then uploaded to a remote server via encrypted DNS queries — a stealthy way to exfiltrate data.

The PyTorch maintainers published a command that admins can use to scan their systems for the malicious torchtriton version. If the rogue package is found, it should be removed immediately, and other steps should be taken to change any potentially compromised credentials or keys.