Anyscale addresses critical vulnerability on Ray framework — but thousands were still exposed

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


The open-source Ray framework is ubiquitous — thousands of organizations use it to run complex, intensive workloads. GPT-3 was trained on it, and some say there isn’t a large language model (LLM) that hasn’t been in touch with it somewhere along the line. 

This is what made the recent discovery of the so-called “ShadowRay” vulnerability so concerning: For seven months, the entry point allowed attackers to gain access to thousands of companies’ AI production workloads, computing power, credentials, passwords, keys, tokens and loads of other sensitive information.  

While the framework’s maintainer Anyscale initially disputed the vulnerability, the company has now issued new tooling to help users determine whether their ports are being exposed. 

“In light of reports of malicious activity, we have moved quickly to provide tooling to allow users to verify proper configuration of their clusters to avoid accidental exposure,” said an Anyscale spokesperson. 

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

Anyscale helping determine whether sensitive ports are exposed

The vulnerability CVE-2023-48022 — first identified in November — can expose the Ray Jobs API to remote code execution attacks. This means that anyone with dashboard network access could invoke “arbitrary jobs” without needing permission, according to Oligo Security, which first revealed the vulnerability in a research report late last month. 

While Anyscale initially disputed the vulnerability, calling it “an expected behavior and a product feature,” the company has now released the Open Ports Checker, which simplifies the process of determining whether or not ports are unexpectedly open. 

Anyscale pre-configured the defaults of the client-side script to reach out to a server they have set up. Scans will return either an “OK” message or “WARNING” report regarding open ports. 

The warning message means the server detects something open on the port. But, this “does not necessarily mean that your port is open to unauthenticated traffic,” Anyscale says. This is because the script does not attempt to identify what is running on the open port, so it cannot determine whether it failed to authenticate or if Ray was listening on that port. 

An “OK” response, meanwhile, means the server could not establish a connection to any ports. However, Anyscale emphasizes, that because they don’t know how a company has its network configured, this response “does not guarantee that no ports are open.” False negatives can occur if firewall or NAT rules are used to route ports. 

According to the company, “Anyscale will host for the community to explicitly test these network paths.” 

The repo is made available under Apache2, and the client can be deployed on any Ray Head or Worker Node. This process works across all versions of Ray and returns all existing ports used by Ray via existing Ray APIs. 

The new capability can also be prompted to send a test network call through those ports to a lightweight web server. 

If they prefer, users can configure the script to send to their own servers. Anyscale has also provided the server-side code if organizations want to self-host to test network traffic through their preferred network topology.

Exposed workloads, cloud environments, credentials

Because it was previously under dispute, ‘ShadowRay’ was not considered a risk and had no patch — thus it was a “shadow vulnerability,” or one that doesn’t come up in scans. 

According to Oligo, the vulnerability exposed: 

  • AI production workloads
  • Access to the cloud environment (AWS, GCP, Azure, Lambda Labs) and sensitive cloud services. 
  • KubernetesAPI access
  • Passwords and OpenAI, Stripe and Slack credentials.
  • Production DB credentials
  • OpenAI, HuggingFace, Stripe and Slack tokens. 

As of March 28, attack management and threat-hunting company Censys identified 315 globally affected hosts. More than three-quarters (77%) of these comprised an exposed login page, while three had exposed file directories. 

‘ShadowRay’ is so dangerous because it attacks behind-the-scenes infrastructure, experts point out. 

“So many discussions have popped up discussing theoretical uses for AI in attacks when the reality is that threat actors can gain far more information by attacking the infrastructure,” Nick Hyatt, director of threat intelligence at Blackpoint Cyber, told VentureBeat.

It’s often assumed that this infrastructure exists in secure environments, he pointed out, so there’s not much concern around securing the data large language models (LLMs) use. Ultimately, this lowers the barrier of entry for attackers who can gain access to potential treasure troves of data. 

“This illustrates how AI is our next ‘shadow IT,’ with researchers and teams moving rapidly and deploying things without security team oversight,” Neil Carpenter, field CTO at Orca Security and a former member of Microsoft’s incident response team, told VentureBeat. 

It can be “deeply problematic” to put an open-source AI project out there because the only security for some critical components “is a note on the last page of the documentation saying you should never expose this outside of a trusted network,” he noted. 

Need for a larger discussion on secure development and data awareness and hygiene

Hyatt noted that ‘ShadowRay’ is part of a larger discussion on secure development principles “that not every company adheres to,” especially with ‘move fast and break things’ attitudes and rapid progress in AI. 

Companies thinking about adopting LLMs — which nearly all are — need to consider data hygiene, he said. 

“You can’t just dump an entire server into an LLM and expect things to go swimmingly — especially if you are handling other companies’ data,” he said. 

Validating datasets and understanding what data is being used and any regulatory requirements around them is critical, particularly when building on-premises LLMs. Similarly, if organizations are using models as part of their regular business flows, are they looking at where they sourced the data from and validating that usage is providing correct answers? 

These aren’t issues solvable just by technology, he pointed out. “These are people, process, and technology issues,” particularly when there is overreliance on LLMs.

Ultimately, he predicted, “As the generative AI field continues to advance, we’re going to see more infrastructure attacks rather than an explicit use of gen AI to bolster attacks. After all, if the data is ripe for the taking and exploits are commonly available, why bother using the tool when I can just steal the data used to power it?”