This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Contributing

Guidelines for developing & contributing to Anchore Open Source projects

Anchore OSS Contribution Guidelines

Each tool has their own slightly different guide, linked below. However, some of the guidelines are common across all tools, and are shown in the next section, General Guidelines.

Tool-Specific Guides

User facing tools

Syft - SBOM generation tool and library
Grype - Vulnerability scanner
Grant - License search

Automation tools

SBOM-Action - SBOM generating GitHub Action
Scan-Action - Vulnerability scanning GitHub Action

Backend tools & libraries

Grype-DB - Vulnerability database creation
Vunnel - Collect vulnerability data from sources
Stereoscope - Container image processing library

General Guidelines

This document is the single source of truth for how to contribute to the code base. We’d love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.

Sign off your work

The sign-off is an added line at the end of the explanation for the commit, certifying that you wrote it or otherwise have the right to submit it as an open-source patch. By submitting a contribution, you agree to be bound by the terms of the DCO Version 1.1 and Apache License Version 2.0.

Signing off a commit certifies the below Developer’s Certificate of Origin (DCO):

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the open source license
       indicated in the file; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

All contributions to this project are licensed under the Apache License Version 2.0, January 2004.

When committing your change, you can add the required line manually so that it looks like this:

Signed-off-by: John Doe <john.doe@example.com>

Creating a signed-off commit is then possible with -s or --signoff:

git commit -s -m "this is a commit message"

To double-check that the commit was signed-off, look at the log output:

$ git log -1
commit 37ceh170e4hb283bb73d958f2036ee5k07e7fde7 (HEAD -> issue-35, origin/main, main)
Author: John Doe <john.doe@example.com>
Date:   Mon Aug 1 11:27:13 2020 -0400

    this is a commit message

    Signed-off-by: John Doe <john.doe@example.com>

Test your changes

This project has a Makefile which includes many helpers running both unit and integration tests. You can run make help to see all the options. Although PRs will have automatic checks for these, it is useful to run them locally, ensuring they pass before submitting changes. Ensure you’ve bootstrapped once before running tests:

make bootstrap

You only need to bootstrap once. After the bootstrap process, you can run the tests as many times as needed:

make unit
make integration

You can also run make all to run a more extensive test suite, but there is additional configuration that will be needed for those tests to run correctly. We will not cover the extra steps here.

Pull Request

If you made it this far and all the tests are passing, it’s time to submit a Pull Request (PR) for the project. Submitting a PR is always a scary moment as what happens next can be an unknown. The projects strive to be easy to work with, we appreciate all contributions. Nobody is going to yell at you or try to make you feel bad. We love contributions and know how scary that first PR can be.

PR Title and Description

Just like the commit title and description mentioned above, the PR title and description is very important for letting others know what’s happening. Please include any details you think a reviewer will need to more properly review your PR.

A PR that is very large or poorly described has a higher likelihood of being pushed to the end of the list. Reviewers like PRs they can understand and quickly review.

What to expect next

Please be patient with the project. We try to review PRs in a timely manner, but this is highly dependent on all the other tasks we have going on. It’s OK to ask for a status update every week or two, it’s not OK to ask for a status update every day.

It’s very likely the reviewer will have questions and suggestions for changes to your PR. If your changes don’t match the current style and flow of the other code, expect a request to change what you’ve done.

Document your changes

And lastly, when proposed changes are modifying user-facing functionality or output, it is expected the PR will include updates to the documentation as well. Our projects are not heavy on documentation. This will mostly be updating the README and help for the tool.

If nobody knows new features exist, they can’t use them!

1 - Syft

Developer guidelines when contributing to Syft

We welcome contributions to the project! There are a few useful things to know before diving into the codebase.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting started

In order to test and develop in the Syft repo you will need the following dependencies installed:

Golang
docker
make
Python (>= 3.9)

Docker settings for getting started

Make sure you’ve updated your docker settings so the default docker socket path is available.

Go to:

docker -> settings -> advanced

Make sure:

Allow the default Docker socket to be used

is checked.

Also double check that the docker context being used is the default context. If it is not, run:

docker context use default

After cloning the following step can help you get setup:

run make bootstrap to download go mod dependencies, create the /.tmp dir, and download helper utilities.
run make to view the selection of developer commands in the Makefile
run make build to build the release snapshot binaries and packages
for an even quicker start you can run go run cmd/syft/main.go to print the syft help.
- this command go run cmd/syft/main.go alpine:latest will compile and run syft against alpine:latest
view the README or syft help output for more output options

The main make tasks for common static analysis and testing are lint, format, lint-fix, unit, integration, and cli.

See make help for all the current make tasks.

Internal Artifactory Settings

Not always applicable

Some companies have Artifactory setup internally as a solution for sourcing secure dependencies. If you’re seeing an issue where the unit tests won’t run because of the below error then this section might be relevant for your use case.

[ERROR] [ERROR] Some problems were encountered while processing the POMs

If you’re dealing with an issue where the unit tests will not pull/build certain java fixtures check some of these settings:

a settings.xml file should be available to help you communicate with your internal artifactory deployment
this can be moved to syft/pkg/cataloger/java/test-fixtures/java-builds/example-jenkins-plugin/ to help build the unit test-fixtures
you’ll also want to modify the build-example-jenkins-plugin.sh to use settings.xml

For more information on this setup and troubleshooting see issue 1895

Architecture

At a high level, this is the package structure of syft:

./cmd/syft/
│   ├── cli/
│   │   ├── cli.go          // where all commands are wired up
│   │   ├── commands/       // all command implementations
│   │   ├── options/        // all command flags and configuration options
│   │   └── ui/             // all handlers for events that are shown on the UI
│   └── main.go             // entrypoint for the application
└── syft/                   // the "core" syft library
    ├── format/             // contains code to encode or decode to and from SBOM formats
    ├── pkg/                // contains code to catalog packages from a source
    ├── sbom/               // contains the definition of an SBOM
    └── source/             // contains code to create a source object for some input type (e.g. container image, directory, etc)

Syft’s core library is implemented in the syft package and subpackages, where the major packages are:

the syft/source package produces a source.Source object that can be used to catalog a directory, container, and other source types.
the syft package contains a single function that can take a source.Source object and catalog it, producing an sbom.SBOM object
the syft/format package contains the ability to encode and decode SBOMs to and from different SBOM formats (such as SPDX and CycloneDX)

The cmd package at the highest level execution flow wires up spf13/cobra commands for execution in the main application:

sequenceDiagram
    participant main as cmd/syft/main
    participant cli as cli.New()
    participant root as root.Execute()
    participant cmd as <command>.Execute()

    main->>+cli:

    Note right of cli: wire ALL CLI commands
    Note right of cli: add flags for ALL commands

    cli-->>-main:  root command

    main->>+root:
    root->>+cmd:
    cmd-->>-root: (error)

    root-->>-main: (error)

    Note right of cmd: Execute SINGLE command from USER

The packages command uses the core library to generate an SBOM for the given user input:

sequenceDiagram
    participant source as source.New(ubuntu:latest)
    participant sbom as sbom.SBOM
    participant catalog as syft.CatalogPackages(src)
    participant encoder as syft.Encode(sbom, format)

    Note right of source: use "ubuntu:latest" as SBOM input

    source-->>+sbom: add source to SBOM struct
    source-->>+catalog: pass src to generate catalog
    catalog-->-sbom: add cataloging results onto SBOM
    sbom-->>encoder: pass SBOM and format desired to syft encoder
    encoder-->>source: return bytes that are the SBOM of the original input

    Note right of catalog: cataloger configuration is done based on src

Additionally, here is a gist of using syft as a library to generate a SBOM for a docker image.

`pkg.Package` object

The pkg.Package object is a core data structure that represents a software package. Fields like name and version probably don’t need a detailed explanation, but some of the other fields are worth a quick overview:

FoundBy: the name of the cataloger that discovered this package (e.g. python-pip-cataloger).
Locations: these are the set of paths and layer ids that were parsed to discover this package (e.g. python-pip-cataloger).
Language: the language of the package (e.g. python).
Type: this is a high-level categorization of the ecosystem the package resides in. For instance, even if the package is a egg, wheel, or requirements.txt reference, it is still logically a “python” package. Not all package types align with a language (e.g. rpm) but it is common.
Metadata: specialized data for specific location(s) parsed. We should try and raise up as much raw information that seems useful. As a rule of thumb the object here should be as flat as possible and use the raw names and values from the underlying source material parsed.

When pkg.Package is serialized an additional MetadataType is shown. This is a label that helps consumers understand the datashape of the Metadata field.

By convention the MetadataType value should follow these rules of thumb:

Only use lowercase letters, numbers, and hyphens. Use hyphens to separate words.
Try to anchor the name in the ecosystem, language, or packaging tooling it belongs to. For a package manager for a language ecosystem the language, framework or runtime should be used as a prefix. For instance pubspec-lock is an OK name, but dart-pubspec-lock is better. For an OS package manager this is not necessary (e.g. apk-db-entry is a good name, but alpine-apk-db-entry is not since alpine and the a in apk is redundant).
Be as specific as possible to what the data represents. For instance ruby-gem is NOT a good MetadataType value, but ruby-gemspec is. Why? Ruby gem information can come from a gemspec file or a Gemfile.lock, which are very different. The latter name provides more context as to what to expect.
Should describe WHAT the data is, NOT HOW it’s used. For instance r-description-installed-file is NOT a good MetadataType value since it’s trying to convey that we use the DESCRIPTION file in the R ecosystem to detect installed packages. Instead simply describe what the DESCRIPTION file is itself without context of how it’s used: r-description.
Use the lock suffix to distinct between manifest files that loosely describe package version requirements vs files that strongly specify one and only one version of a package (“lock” files). These should only be used with respect to package managers that have the guide and lock distinction, but would not be appropriate otherwise (e.g. rpm does not have a guide vs lock, so lock should NOT be used to describe a db entry).
Use the archive suffix to indicate a package archive (e.g. rpm file, apk file, etc) that describes the contents of the package. For example an RPM file that was cataloged would have a rpm-archive metadata type (not to be confused with an RPM DB record entry which would be rpm-db-entry).
Use the entry suffix to indicate information about a package that was found as a single entry within file that has multiple package entries. If the entry was found within a DB or a flat-file store for an OS package manager, you should use db-entry.
Should NOT contain the phrase package, though exceptions are allowed (say if the canonical name literally has the phrase package in it).
Should NOT contain have a file suffix unless the canonical name has the term “file”, such as a pipfile or gemfile. An example of a bad name for this rule isruby-gemspec-file; a better name would be ruby-gemspec.
Should NOT contain the exact filename+extensions. For instance pipfile.lock shouldn’t really be in the name, instead try and describe what the file is: python-pipfile-lock (but shouldn’t this be python-pip-lock you might ask? No, since the pip package manger is not related to the pipfile project).
Should NOT contain the phrase metadata, unless the canonical name has this term.
Should represent a single use case. For example, trying to describe Hackage metadata with a single HackageMetadata struct (and thus MetadataType) is not allowed since it represents 3 mutually exclusive use cases: representing a stack.yaml, stack.lock, or cabal.project file. Instead, each of these should have their own struct types and MetadataType values.

There are other cases that are not covered by these rules… and that’s ok! The goal is to provide a consistent naming scheme that is easy to understand and use when it’s applicable. If the rules do not exactly apply in your situation then just use your best judgement (or amend these rules as needed whe new common cases come up).

What if the underlying parsed data represents multiple files? There are two approaches to this:

use the primary file to represent all the data. For instance, though the dpkg-cataloger looks at multiple files to get all information about a package, it’s the status file that gets represented.
nest each individual file’s data under the Metadata field. For instance, the java-archive-cataloger may find information from on or all of the files: pom.xml, pom.properties, and MANIFEST.MF. However, the metadata is simply `java-metadata’ with each possibility as a nested optional field.

Syft Catalogers

Catalogers are the way in which syft is able to identify and construct packages given a set a targeted list of files. For example, a cataloger can ask syft for all package-lock.json files in order to parse and raise up javascript packages (see how file globs and file parser functions are used for a quick example).

From a high level catalogers have the following properties:

They are independent from one another. The java cataloger has no idea of the processes, assumptions, or results of the python cataloger, for example.
They do not know what source is being analyzed. Are we analyzing a local directory? an image? if so, the squashed representation or all layers? The catalogers do not know the answers to these questions. Only that there is an interface to query for file paths and contents from an underlying “source” being scanned.
Packages created by the cataloger should not be mutated after they are created. There is one exception made for adding CPEs to a package after the cataloging phase, but that will most likely be moved back into the cataloger in the future.

Cataloger names should be unique and named with the following rules of thumb in mind:

Must end with -cataloger
Use lowercase letters, numbers, and hyphens only
Use hyphens to separate words
Catalogers for language ecosystems should start with the language name (e.g. python- for a cataloger that raises up python packages)
Distinct between when the cataloger is searching for evidence of installed packages vs declared packages. For example, there are currently two different gemspec-based catalogers, the ruby-gemspec-cataloger and ruby-installed-gemspec-cataloger, where the latter requires that the gemspec is found within a specifications directory (which means it was installed, not just at the root of a source repo).

Building a new Cataloger

Catalogers must fulfill the pkg.Cataloger interface in order to add packages to the SBOM. All catalogers should be added to:

the global list of catalogers
at least one source-specific list, today the two lists are directory catalogers and image catalogers

For reference, catalogers are invoked within syft one after the other, and can be invoked in parallel.

generic.NewCataloger is an abstraction syft used to make writing common components easier (see the apkdb cataloger for example usage). It takes the following information as input:

A catalogerName to identify the cataloger uniquely among all other catalogers.
Pairs of file globs as well as parser functions to parse those files. These parser functions return a slice of pkg.Package as well as a slice of artifact.Relationship to describe how the returned packages are related. See this the apkdb cataloger parser function as an example.

Identified packages share a common pkg.Package struct so be sure that when the new cataloger is constructing a new package it is using the Package struct. If you want to return more information than what is available on the pkg.Package struct then you can do so in the pkg.Package.Metadata section of the struct, which is unique for each pkg.Type. See the pkg package for examples of the different metadata types that are supported today. These are plugged into the MetadataType and Metadata fields in the above struct. MetadataType informs which type is being used. Metadata is an interface converted to that type.

Finally, here is an example of where the package construction is done within the apk cataloger:

Interested in building a new cataloger? Checkout the list of issues with the new-cataloger label! If you have questions about implementing a cataloger feel free to file an issue or reach out to us on discourse!

Searching for files

All catalogers are provided an instance of the file.Resolver to interface with the image and search for files. The implementations for these abstractions leverage stereoscope in order to perform searching. Here is a rough outline how that works:

a stereoscope file.Index is searched based on the input given (a path, glob, or MIME type). The index is relatively fast to search, but requires results to be filtered down to the files that exist in the specific layer(s) of interest. This is done automatically by the filetree.Searcher abstraction. This abstraction will fallback to searching directly against the raw filetree.FileTree if the index does not contain the file(s) of interest. Note: the filetree.Searcher is used by the file.Resolver abstraction.
Once the set of files are returned from the filetree.Searcher the results are filtered down further to return the most unique file results. For example, you may have requested for files by a glob that returns multiple results. These results are filtered down to deduplicate by real files, so if a result contains two references to the same file, say one accessed via symlink and one accessed via the real path, then the real path reference is returned and the symlink reference is filtered out. If both were accessed by symlink then the first (by lexical order) is returned. This is done automatically by the file.Resolver abstraction.
By the time results reach the pkg.Cataloger you are guaranteed to have a set of unique files that exist in the layer(s) of interest (relative to what the resolver supports).

Testing

Testing commands

make help shows a list of available commands
make unit, make integration, make cli, and make acceptance run those test suites (see below)
make test runs all those tests (and is therefore pretty slow)
make fixtures clears and re-fetches all test fixtures.
go test ./syft/pkg/ for example can test particular packages, assuming fixtures are already made
make clean-cache cleans all test cache. Note that subsequent test runs will be slower after this

Levels of testing

unit: The default level of test which is distributed throughout the repo are unit tests. Any _test.go file that does not reside somewhere within the /test directory is a unit test. Other forms of testing should be organized in the /test directory. These tests should focus on correctness of functionality in depth. % test coverage metrics only considers unit tests and no other forms of testing.
integration: located within cmd/syft/internal/test/integration, these tests focus on the behavior surfaced by the common library entrypoints from the syft package and make light assertions about the results surfaced. Additionally, these tests tend to make diversity assertions for enum-like objects, ensuring that as enum values are added to a definition that integration tests will automatically fail if no test attempts to use that enum value. For more details see the “Data diversity and freshness assertions” section below.
cli: located with in test/cli, these are tests that test the correctness of application behavior from a snapshot build. This should be used in cases where a unit or integration test will not do or if you are looking for in-depth testing of code in the cmd/ package (such as testing the proper behavior of application configuration, CLI switches, and glue code before syft library calls).
acceptance: located within test/compare and test/install, these are smoke-like tests that ensure that application packaging and installation works as expected. For example, during release we provide RPM packages as a download artifact. We also have an accompanying RPM acceptance test that installs the RPM from a snapshot build and ensures the output of a syft invocation matches canned expected output. New acceptance tests should be added for each release artifact and architecture supported (when possible).

Data diversity and freshness assertions

It is important that tests against the codebase are flexible enough to begin failing when they do not cover “enough” of the objects under test. “Cover” in this case does not mean that some percentage of the code has been executed during testing, but instead that there is enough diversity of data input reflected in testing relative to the definitions available.

For instance, consider an enum-like value like so:

type Language string

const (
  Java            Language = "java"
  JavaScript      Language = "javascript"
  Python          Language = "python"
  Ruby            Language = "ruby"
  Go              Language = "go"
)

Say we have a test that exercises all the languages defined today:

func TestCatalogPackages(t *testing.T) {
  testTable := []struct {
    // ... the set of test cases that test all languages
  }
  for _, test := range cases {
    t.Run(test.name, func (t *testing.T) {
      // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
      // ...
    })
  }
}

Where each test case has a inputFixturePath that would result with packages from each language. This test is brittle since it does not assert that all languages were exercised directly and future modifications (such as adding a new language) won’t be covered by any test cases.

To address this the enum-like object should have a definition of all objects that can be used in testing:

type Language string

// const( Java Language = ..., ... )

var AllLanguages = []Language{
 Java,
 JavaScript,
 Python,
 Ruby,
 Go,
 Rust,
}

Allowing testing to automatically fail when adding a new language:

func TestCatalogPackages(t *testing.T) {
  testTable := []struct {
   // ... the set of test cases that (hopefully) covers all languages
  }

  // new stuff...
  observedLanguages := strset.New()

  for _, test := range cases {
    t.Run(test.name, func (t *testing.T) {
      // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
     // ...

     // new stuff...
     for _, actualPkg := range actual {
        observedLanguages.Add(string(actualPkg.Language))
     }

    })
  }

   // new stuff...
  for _, expectedLanguage := range pkg.AllLanguages {
    if  !observedLanguages.Contains(expectedLanguage) {
      t.Errorf("failed to test language=%q", expectedLanguage)
    }
  }
}

This is a better test since it will fail when someone adds a new language but fails to write a test case that should exercise that new language. This method is ideal for integration-level testing, where testing correctness in depth is not needed (that is what unit tests are for) but instead testing in breadth to ensure that units are well integrated.

A similar case can be made for data freshness; if the quality of the results will be diminished if the input data is not kept up to date then a test should be written (when possible) to assert any input data is not stale.

An example of this is the static list of licenses that is stored in internal/spdxlicense for use by the SPDX presenters. This list is updated and published periodically by an external group and syft can grab and update this list by running go generate ./... from the root of the repo.

An integration test has been written to grabs the latest license list version externally and compares that version with the version generated in the codebase. If they differ, the test fails, indicating to someone that there is an action needed to update it.

_The key takeaway is to try and write tests that fail when data assumptions change and not just when code changes._

Snapshot tests

The format objects make a lot of use of “snapshot” testing, where you save the expected output bytes from a call into the git repository and during testing make a comparison of the actual bytes from the subject under test with the golden copy saved in the repo. The “golden” files are stored in the test-fixtures/snapshot directory relative to the go package under test and should always be updated by invoking go test on the specific test file with a specific CLI update flag provided.

Many of the Format tests make use of this approach, where the raw SBOM report is saved in the repo and the test compares that SBOM with what is generated from the latest presenter code. The following command can be used to update the golden files for the various snapshot tests:

make update-format-golden-files

These flags are defined at the top of the test files that have tests that use the snapshot files.

Snapshot testing is only as good as the manual verification of the golden snapshot file saved to the repo! Be careful and diligent when updating these files.

2 - Grype

Developer guidelines when contributing to Grype

There are a few useful things to know before diving into the codebase. This project depends on a few things being available like a vulnerability database, which you might want to create manually instead of retrieving a released version.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting started

After cloning do the following:

run go build ./cmd/grype to get a binary named main from the source (use -o <name> to get a differently named binary), or optionally go run ./cmd/grype to run from source.

In order to run tests and build all artifacts:

run make bootstrap to download go mod dependencies, create the /.tmp dir, and download helper utilities (this only needs to be done once or when build tools are updated).
run make to run linting, tests, and other verifications to make certain everything is working alright.

The main make tasks for common static analysis and testing are lint, format, lint-fix, unit, and integration.

See make help for all the current make tasks.

Relationship to Syft

Grype uses Syft as a library for all-things related to obtaining and parsing the given scan target (pulling container images, parsing container images, indexing directories, cataloging packages, etc). Releases of Grype should always use released versions of Syft (commits that are tagged and show up in the GitHub releases page). However, continually integrating unreleased Syft changes into Grype incrementally is encouraged (e.g. go get github.com/anchore/syft@main) as long as by the time a release is cut the Syft version is updated to a released version (e.g. go get github.com/anchore/syft@v<semantic-version>).

Inspecting the database

The currently supported database format is Sqlite3. Install sqlite3 in your system and ensure that the sqlite3 executable is available in your path. Ask grype about the location of the database, which will be different depending on the operating system:

$ go run ./cmd/grype db status
Location:  /Users/alfredo/Library/Caches/grype/db
Built:  2020-07-31 08:18:29 +0000 UTC
Current DB Version:  1
Require DB Version:  1
Status: Valid

The database is located within the XDG_CACHE_HOME path. To verify the database filename, list that path:

# OSX-specific path
$ ls -alh  /Users/alfredo/Library/Caches/grype/db
total 445392
drwxr-xr-x  4 alfredo  staff   128B Jul 31 09:27 .
drwxr-xr-x  3 alfredo  staff    96B Jul 31 09:27 ..
-rw-------  1 alfredo  staff   139B Jul 31 09:27 metadata.json
-rw-r--r--  1 alfredo  staff   217M Jul 31 09:27 vulnerability.db

Next, open the vulnerability.db with sqlite3:

sqlite3 /Users/alfredo/Library/Caches/grype/db/vulnerability.db

To make the reporting from Sqlite3 easier to read, enable the following:

sqlite> .mode column
sqlite> .headers on

List the tables:

sqlite> .tables
id                      vulnerability           vulnerability_metadata

In this example you retrieve a specific vulnerability from the nvd namespace:

sqlite> select * from vulnerability where (namespace="nvd" and package_name="libvncserver") limit 1;
id             record_source  package_name  namespace   version_constraint  version_format  cpes                                                         proxy_vulnerabilities
-------------  -------------  ------------  ----------  ------------------  --------------  -----------------------------------------------------------  ---------------------
CVE-2006-2450                 libvncserver  nvd         = 0.7.1             unknown         ["cpe:2.3:a:libvncserver:libvncserver:0.7.1:*:*:*:*:*:*:*"]  []

3 - Grant

Developer guidelines when contributing to Grant

We welcome contributions to the project! There are a few useful things to know before diving into the codebase.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting Started

After pulling the repository, you can get started by running the following command to install the necessary dependencies and build grant from source

make

After building the project, you can run the following command to run the newly built binary

./snapshot/<os>-build_<>os_<arch>/grant

Keep in mind the build artifacts are placed in the snapshot directory and built for each supported platform so choose the appropriate binary for your platform.

If you just want to run the project with any local changes you have made, you can run the following command:

go run cmd/grant/main.go

Testing

You can run the tests for the project by running the following command:

make test

Linting

You can run the linter for the project by running the following command:

make static-analysis

Making a PR

Just fork the repository, make your changes on a branch, and submit a PR. We will review your changes and merge them if they are good to go.

When making a PR, please make sure to include a description of the changes you have made and the reasoning behind them. If you are adding a new feature, please include tests for the new feature. If you are fixing a bug, please include a test that reproduces the bug and ensure that the test passes after your changes.

4 - Grype-DB

Developer guidelines when contributing to Grype-DB

We welcome contributions to the project! There are a few useful things to know before diving into the codebase.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting started

This codebase is primarily Go, however, there are also Python scripts critical to the daily DB publishing process as well as acceptance testing. You will require the following:

Python 3.8+ installed on your system. Consider using pyenv if you do not have a preference for managing python interpreter installations.
zstd binary utility if you are packaging v6+ DB schemas
(optional) xz binary utility if you have specifically overridden the package command options

Poetry installed for dependency and virtualenv management for python dependencies, to install:

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -

To download go tooling used for static analysis and dependent go modules run the following:

make bootstrap

Getting an initial vulnerability data cache

In order to build a grype DB you will need a local cache of vulnerability data:

make download-all-provider-cache

This will populate the ./data directory locally with everything needed to run grype-db build (without needing to run grype-db pull).

Running tests

To unit test the Go code and unit test the publisher python scripts:

make unit

To verify that all supported schema versions interop with grype run:

make acceptance
# Note: this may take a while... go make some coffee.

The main make tasks for common static analysis functions are lint, format, lint-fix, unit, cli.

See make help for all the current make tasks.

Create a new DB schema

Create a new v# schema package in the grype repo (within pkg/db)
Create a new v# schema package in the grype-db repo (use the bump-schema.py helper script) that uses the new changes from grype-db
Modify the manager/src/grype_db_manager/data/schema-info.json to pin the last-latest version to a specific version of grype and add the new schema version pinned to the “main” branch of grype (or a development branch)
Update all references in grype to use the new schema
Use the Staging DB Publisher workflow to test your DB changes with grype in a flow similar to the daily DB publisher workflow

Making a staging DB

While developing a new schema version it may be useful to get a DB built for you by the Staging DB Publisher GitHub Actions workflow. This code exercises the same code as the Daily DB Publisher, with the exception that only a single schema is built and is validated against a given development branch of grype. When these DBs are published you can point grype at the proper listing file like so:

GRYPE_DB_UPDATE_URL=https://toolbox-data.anchore.io/grype/staging-databases/listing.json grype centos:8 ...

Architecture

grype-db is essentially an application that extracts information from upstream vulnerability data providers, transforms it into smaller records targeted for grype consumption, and loads the individual records into a new SQLite DB.

~~~~~ "Pull" ~~~~~      ~~~~~~~~~~~~~~~~~~ "Build" ~~~~~~~~~~~~~~~~     ~~ "Package" ~~

┌─────────────────┐     ┌───────────────────┐     ┌───────────────┐     ┌─────────────┐
│ Pull vuln data  │     │ Transform entries │     │ Load entries  │     │ Package DB  │
│ from upstream   ├────►│                   ├────►│ into new DB   ├────►│             │
└─────────────────┘     └───────────────────┘     └───────────────┘     └─────────────┘

What makes grype-db a little more unique than a typical ETL job is the extra responsibility of needing to transform the most recent vulnerability data shape (defined in the vunnel repo) to all supported DB schema versions. From the perspective of the Daily DB Publisher workflow, (abridged) execution looks something like this:

 ┌─────────────────┐          ┌──────────────┐     ┌────────────────┐
 │ Pull vuln data  ├────┬────►│ Build V1 DB  │────►│ Package V1 DB  │ ...
 └─────────────────┘    │     └──────────────┘     └────────────────┘
                        │     ┌──────────────┐     ┌────────────────┐
                        ├────►│ Build V2 DB  │────►│ Package V2 DB  │ ...
                        │     └──────────────┘     └────────────────┘
                        │     ┌──────────────┐     ┌────────────────┐
                        ├────►│ Build V3 DB  │────►│ Package V3 DB  │ ...
                        │     └──────────────┘     └────────────────┘
                        ...

In order to support multiple DB schemas easily from a code-organization perspective the following abstractions exist:

Provider: responsible for providing raw vulnerability data files that are cached locally for later processing.
Processor: responsible for unmarshalling any entries given by the Provider, passing them into Transformers, and returning any resulting entries. Note: the object definition is schema-agnostic but instances are schema-specific since Transformers are dependency-injected into this object.
Transformer: Takes raw data entries of a specific vunnel-defined schema and transforms the data into schema-specific entries to later be written to the database. Note: the object definition is schema-specific, encapsulating grypeDB/v# specific objects within schema-agnostic Entry objects.
Entry: Encapsulates schema-specific database records produced by Processors/Transformers (from the provider data) and accepted by Writers.
Writer: Takes Entry objects and writes them to a backing store (today a SQLite database). Note: the object definition is schema-specific and typically references grypeDB/v# schema-specific writers.

All the above abstractions are defined in the pkg/data Go package and are used together commonly in the following flow:

                       ┌────────────────────────────────────────────┐
                cache  │data.Processor                              │
 ┌─────────────┐ file  │ ┌────────────┐       ┌───────────────────┐ │ []data.Entry  ┌───────────┐     ┌───────────────────────┐
 │data.Provider├──────►│ │unmarshaller├──────►│v# data.Transformer│ ├──────────────►│data.Writer├────►│grypeDB/v#/writer.Write│
 └─────────────┘       │ └────────────┘       └───────────────────┘ │               └───────────┘     └───────────────────────┘
                       └───────────────────────────────────────────-┘

Where there is a data.Provider for each upstream data source (e.g. canonical, redhat, github, NIST, etc.), a data.Processor for every vunnel-defined data shape (github, os, msrc, nvd, etc… defined in the vunnel repo), a data.Transformer for every processor and DB schema version pairing, and a data.Writer for every DB schema version.

From a Go package organization perspective, the above abstractions are organized as follows:

grype-db/
└── pkg
    ├── data                      # common data structures and objects that define the ETL flow
    ├── process
    │    ├── processors           # common data.Processors to call common unmarshallers and pass entries into data.Transformers
    │    ├── v1
    │    │    ├── processors.go   # wires up all common data.Processors to v1-specific data.Transformers
    │    │    ├── writer.go       # v1-specific store writer
    │    │    └── transformers    # v1-specific transformers
    │    ├── v2
    │    │    ├── processors.go   # wires up all common data.Processors to v2-specific data.Transformers
    │    │    ├── writer.go       # v2-specific store writer
    │    │    └── transformers    # v2-specific transformers
    │    └── ...more schema versions here...
    └── provider                  # common code to pull, unmarshal, and cache updstream vuln data into local files
        └── ...

DB structure and definitions

The definitions of what goes into the database and how to access it (both reads and writes) live in the public grype repo under the db package. Responsibilities of grype (not grype-db) include (but are not limited to):

What tables are in the database
What columns are in each table
How each record should be serialized for writing into the database
How records should be read/written from/to the database
Providing rich objects for dealing with schema-specific data structures
The name of the SQLite DB file within an archive
The definition of a listing file and listing file entries

The purpose of grype-db is to use the definitions from grype.db and the upstream vulnerability data to create DB archives and make them publicly available for consumption via grype.

DB listing file

The listing file contains URLs to grype DB archives that are available for download, organized by schema version, and ordered by latest-date-first. The definition of the listing file resides in grype, however, it is the responsibility of the grype-db repo to generate DBs and re-create the listing file daily. As long as grype has been configured to point to the correct listing file, the DBs can be stored separately from the listing file, be replaced with a running service returning the listing file contents, or can be mirrored for systems behind an air gap.

Getting a grype DB out to OSS users (daily)

There are two workflows that drive getting a new grype DB out to OSS users:

The daily data sync workflow, which uses vunnel to pull upstream vulnerability data.
The daily DB publisher workflow, which uses builds and publishes a grype DB from the data obtained in the daily data sync workflow.

Daily data sync workflow

This workflow takes the upstream vulnerability data (from canonical, redhat, debian, NVD, etc), processes it, and writes the results to the OCI repos.

┌──────────────┐         ┌──────────────────────────────────────────────────────────┐
│ Pull alpine  ├────────►│ Publish to ghcr.io/anchore/grype-db/data/alpine:<date>   │
└──────────────┘         └──────────────────────────────────────────────────────────┘
┌──────────────┐         ┌──────────────────────────────────────────────────────────┐
│ Pull amazon  ├────────►│ Publish to ghcr.io/anchore/grype-db/data/amazon:<date>   │
└──────────────┘         └──────────────────────────────────────────────────────────┘
┌──────────────┐         ┌──────────────────────────────────────────────────────────┐
│ Pull debian  ├────────►│ Publish to ghcr.io/anchore/grype-db/data/debian:<date>   │
└──────────────┘         └──────────────────────────────────────────────────────────┘
┌──────────────┐         ┌──────────────────────────────────────────────────────────┐
│ Pull github  ├────────►│ Publish to ghcr.io/anchore/grype-db/data/github:<date>   │
└──────────────┘         └──────────────────────────────────────────────────────────┘
┌──────────────┐         ┌──────────────────────────────────────────────────────────┐
│ Pull nvd     ├────────►│ Publish to ghcr.io/anchore/grype-db/data/nvd:<date>      │
└──────────────┘         └──────────────────────────────────────────────────────────┘
... repeat for all upstream providers ...

Once all providers have been updated a single vulnerability cache OCI repo is updated with all of the latest vulnerability data at ghcr.io/anchore/grype-db/data:<date>. This repo is what is used downstream by the DB publisher workflow to create grype DBs.

The in-repo .grype-db.yaml and .vunnel.yaml configurations are used to define the upstream data sources, how to obtain them, and where to put the results locally.

Daily DB publishing workflow

This workflow takes the latest vulnerability data cache, builds a grype DB, and publishes it for general consumption.

The manager/ directory contains all code responsible for driving the Daily DB Publisher workflow, generating DBs for all supported schema versions and making them available to the public. The publishing process is made of three steps (depicted and described below):

~~~~~ 1. Pull ~~~~~      ~~~~~~~~~~~~~~~~~~ 2. Generate Databases ~~~~~~~~~~~~~~~~~~~~      ~~ 3. Update Listing ~~

┌─────────────────┐      ┌──────────────┐     ┌───────────────┐     ┌────────────────┐      ┌─────────────────────┐
│ Pull vuln data  ├──┬──►│ Build V1 DB  ├────►│ Package V1 DB ├────►│ Upload Archive ├──┬──►│ Update listing file │
└─────────────────┘  │   └──────────────┘     └───────────────┘     └────────────────┘  │   └─────────────────────┘
  (from the daily    │   ┌──────────────┐     ┌───────────────┐     ┌────────────────┐  │
   sync workflow     ├──►│ Build V2 DB  ├────►│ Package V2 DB ├────►│ Upload Archive ├──┤
   output)           │   └──────────────┘     └───────────────┘     └────────────────┘  │
                     │                                                                  │
                     └──►      ...repeat for as many DB schemas are supported...      ──┘

Note: Running these steps locally may result in publishing a locally generated DB to production, which should never be done.

pull: Download the latest vulnerability data from various upstream data sources into a local directory.
```
# from the repo root
make download-all-provider-cache
```
The destination for the provider data is in the data/vunnel directory.
generate: Build databases for all supported schema versions based on the latest vulnerability data and upload them to S3.
```
# from the repo root
# must be in a poetry shell
grype-db-manager db build-and-upload --schema-version <version>
```
This call needs to be repeated for all schema versions that are supported (see manager/src/grype_db_manager/data/schema-info.json).

Once built each DB is smoke tested with grype by comparing the performance of the last OSS DB with the current (local) DB, using the vulnerability-match-label to quality differences.

Only DBs that pass validation are uploaded to S3. At this step the DBs can be downloaded from S3 but are NOT yet discoverable via grype db download yet (this is what the listing file update will do).
update-listing: Generate and upload a new listing file to S3 based on the existing listing file and newly discovered DB archives already uploaded to S3.
```
# from the repo root
# must be in a poetry shell
grype-db-manager listing update
```
During this step the locally crafted listing file is tested against installations of grype. The correctness of the reports are NOT verified (since this was done in a previous step), however, in order to pass the scan must have a non-zero count of matches found.

Once the listing file has been uploaded user-facing grype installations should pick up that there are new DBs available to download.

5 - SBOM Action

Developer guidelines when contributing to sbom-action

TODO

6 - Scan Action

Developer guidelines when contributing to scan-action

TODO

7 - Vunnel

Developer guidelines when contributing to Vunnel

We welcome contributions to the project! There are a few useful things to know before diving into the codebase.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting Started

This project requires:

python (>= 3.7)
pip (>= 22.2)
uv
docker
go (>= 1.20)
posix shell (bash, zsh, etc… needed for the make dev “development shell”)

Once you have python and uv installed, get the project bootstrapped:

# clone grype and grype-db, which is needed for provider development
git clone git@github.com:anchore/grype.git
git clone git@github.com:anchore/grype-db.git
# note: if you already have these repos cloned, you can skip this step. However, if they
# reside in a different directory than where the vunnel repo is, then you will need to
# set the `GRYPE_PATH` and/or `GRYPE_DB_PATH` environment variables for the development
# shell to function. You can add these to a local .env file in the vunnel repo root.

# clone the vunnel repo
git clone git@github.com:anchore/vunnel.git
cd vunnel

# get basic project tooling
make bootstrap

# install project dependencies
uv sync --all-extras --dev

Pre-commit is used to help enforce static analysis checks with git hooks:

uv run pre-commit install --hook-type pre-push

Developing

The easiest way to develop on a providers is to use the development shell, selecting the specific provider(s) you’d like to focus your development workflow on:

# Specify one or more providers you want to develop on.
# Any provider from the output of "vunnel list" is valid.
# Specify multiple as a space-delimited list:
# make dev providers="oracle wolfi nvd"
$ make dev provider="oracle"

Entering vunnel development shell...
• Configuring with providers: oracle ...
• Writing grype config: /Users/wagoodman/code/vunnel/.grype.yaml ...
• Writing grype-db config: /Users/wagoodman/code/vunnel/.grype-db.yaml ...
• Activating virtual env: /Users/wagoodman/code/vunnel/.venv ...
• Installing editable version of vunnel ...
• Building grype ...
• Building grype-db ...

Note: development builds grype and grype-db are now available in your path.
To update these builds run 'make build-grype' and 'make build-grype-db' respectively.
To run your provider and update the grype database run 'make update-db'.
Type 'exit' to exit the development shell.

You can now run the provider you specified in the make dev command, build an isolated grype DB, and import the DB into grype:

$ make update-db
• Updating vunnel providers ...
[0000]  INFO grype-db version: ede464c2def9c085325e18ed319b36424d71180d-adhoc-build
...
[0000]  INFO configured providers parallelism=1 providers=1
[0000] DEBUG   └── oracle
[0000] DEBUG all providers started, waiting for graceful completion...
[0000]  INFO running vulnerability provider provider=oracle
[0000] DEBUG oracle:  2023-03-07 15:44:13 [INFO] running oracle provider
[0000] DEBUG oracle:  2023-03-07 15:44:13 [INFO] downloading ELSA from https://linux.oracle.com/security/oval/com.oracle.elsa-all.xml.bz2
[0019] DEBUG oracle:  2023-03-07 15:44:31 [INFO] wrote 6298 entries
[0019] DEBUG oracle:  2023-03-07 15:44:31 [INFO] recording workspace state
• Building grype-db ...
[0000]  INFO grype-db version: ede464c2def9c085325e18ed319b36424d71180d-adhoc-build
[0000]  INFO reading all provider state
[0000]  INFO building DB build-directory=./build providers=[oracle] schema=5
• Packaging grype-db ...
[0000]  INFO grype-db version: ede464c2def9c085325e18ed319b36424d71180d-adhoc-build
[0000]  INFO packaging DB from="./build" for="https://toolbox-data.anchore.io/grype/databases"
[0000]  INFO created DB archive path=build/vulnerability-db_v5_2023-03-07T20:44:13Z_405ae93d52ac4cde6606.tar.gz
• Importing DB into grype ...
Vulnerability database imported

You can now run grype that uses the newly created DB:

$ grype oraclelinux:8.4
 ✔ Pulled image
 ✔ Loaded image
 ✔ Parsed image
 ✔ Cataloged packages      [195 packages]
 ✔ Scanning image...       [193 vulnerabilities]
   ├── 0 critical, 25 high, 146 medium, 22 low, 0 negligible
   └── 193 fixed

NAME                        INSTALLED                FIXED-IN                    TYPE  VULNERABILITY   SEVERITY
bind-export-libs            32:9.11.26-4.el8_4       32:9.11.26-6.el8            rpm   ELSA-2021-4384  Medium
bind-export-libs            32:9.11.26-4.el8_4       32:9.11.36-3.el8            rpm   ELSA-2022-2092  Medium
bind-export-libs            32:9.11.26-4.el8_4       32:9.11.36-3.el8_6.1        rpm   ELSA-2022-6778  High
bind-export-libs            32:9.11.26-4.el8_4       32:9.11.36-5.el8            rpm   ELSA-2022-7790  Medium

# note that we're using the database we just built...
$ grype db status
Location:  /Users/wagoodman/code/vunnel/.cache/grype/5  # <--- this is the local DB we just built
...

# also note that we're using a development build of grype
$ which grype
/Users/wagoodman/code/vunnel/bin/grype

The development builds of grype and grype-db provided are derived from ../grype and ../grype-db paths relative to the vunnel project. If you want to use a different path, you can set the GRYPE_PATH and GRYPE_DB_PATH environment variables. This can be persisted by adding a .env file to the root of the vunnel project:

# example .env file in the root of the vunnel repo
GRYPE_PATH=~/somewhere/else/grype
GRYPE_DB_PATH=~/also/somewhere/else/grype-db

To rebuild the grype and grype-db binaries from local source, run:

make build-grype
make build-grype-db

This project uses Make for running common development tasks:


make                  # run static analysis and unit testing
make static-analysis  # run static analysis
make unit             # run unit tests
make format           # format the codebase with black
make lint-fix         # attempt to automatically fix linting errors
...

If you want to see all of the things you can do:

make help

If you want to use a locally-editable copy of vunnel while you develop without the custom development shell:

uv pip uninstall vunnel  #... if you already have vunnel installed in this virtual env
uv pip install -e .

Snapshot Tests

In order to ensure that the same feed state from providers would make the same set of vulnerabilities, snapshot testing is used.

Snapshot tests are run as part of ordinary unit tests, and will run during make unit.

To update snapshots, run the following pytest command. (Note that this example is for the debian provider, and the test name and path will be different for other providers):

pytest ./tests/unit/providers/debian/test_debian.py -k test_provider_via_snapshot --snapshot-update

Architecture

Vunnel is a CLI tool that downloads and processes vulnerability data from various sources (in the codebase, these are called “providers”).

Conceptually, one or more invocations of Vunnel will produce a single data directory which Grype-DB uses to create a Grype database:

Additionally, the Vunnel CLI tool is optimized to run a single provider at a time, not orchestrating multiple providers at once. Grype-db is the tool that collates output from multiple providers and produces a single database, and is ultimately responsible for orchestrating multiple Vunnel calls to prepare the input data:

For more information about how Grype-DB uses Vunnel see the Grype-DB documentation.

Vunnel Providers

A “Provider” is the core abstraction for Vunnel and represents a single source of vulnerability data. Vunnel is a CLI wrapper around multiple vulnerability data providers.

All provider implementations should…

live under src/vunnel/providers in their own directory (e.g. the NVD provider code is under src/vunnel/providers/nvd/...)
have a class that implements the Provider interface
be centrally registered with a unique name under src/vunnel/providers/__init__.py
be independent from other vulnerability providers data –that is, the debian provider CANNOT reach into the NVD data provider directory to look up information (such as severity)
follow the workspace conventions for downloaded provider inputs, produced results, and tracking of metadata

Each provider has a “workspace” directory within the “vunnel root” directory (defaults to ./data) named after the provider.

data/                       # the "vunnel root" directory
└── alpine/                 # the provider workspace directory
    ├── input/              # any file that needs to be downloaded and referenced should be stored here
    ├── results/            # schema-compliant vulnerability results (1 record per file)
    ├── checksums           # listing of result file checksums (xxh64 algorithm)
    └── metadata.json       # metadata about the input and result files

The metadata.json and checksums are written out after all results are written to results/. An example metadata.json:

{
  "provider": "amazon",
  "urls": ["https://alas.aws.amazon.com/AL2022/alas.rss"],
  "listing": {
    "digest": "dd3bb0f6c21f3936",
    "path": "checksums",
    "algorithm": "xxh64"
  },
  "timestamp": "2023-01-01T21:20:57.504194+00:00",
  "schema": {
    "version": "1.0.0",
    "url": "https://raw.githubusercontent.com/anchore/vunnel/main/schema/provider-workspace-state/schema-1.0.0.json"
  }
}

Where:

provider: the name of the provider that generated the results
urls: the URLs that were referenced to generate the results
listing: the path to the checksums listing file that lists all of the results, the checksum of that file, and the algorithm used to checksum the file (and the same algorithm used for all contained checksums)
timestamp: the point in time when the results were generated or last updated
schema: the data shape that the current file conforms to

All results from a provider are handled by a common base class helper (provider.Provider.results_writer()) and is driven by the application configuration (e.g. JSON flat files or SQLite database). The data shape of the results are self-describing via an envelope with a schema reference. For example:

For example:

{
  "schema": "https://raw.githubusercontent.com/anchore/vunnel/main/schema/vulnerability/os/schema-1.0.0.json",
  "identifier": "3.3/cve-2015-8366",
  "item": {
    "Vulnerability": {
      "Severity": "Unknown",
      "NamespaceName": "alpine:3.3",
      "FixedIn": [
        {
          "VersionFormat": "apk",
          "NamespaceName": "alpine:3.3",
          "Name": "libraw",
          "Version": "0.17.1-r0"
        }
      ],
      "Link": "http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8366",
      "Description": "",
      "Metadata": {},
      "Name": "CVE-2015-8366",
      "CVSS": []
    }
  }
}

Where:

the schema field is a URL to the schema that describes the data shape of the item field
the identifier field should have a unique identifier within the context of the provider results
the item field is the actual vulnerability data, and the shape of this field is defined by the schema

Note that the identifier is 3.3/cve-2015-8366 and not just cve-2015-8366 in order to uniquely identify cve-2015-8366 as applied to the alpine 3.3 distro version among other records in the results directory.

Currently only JSON payloads are supported at this time.

Possible vulnerability schemas supported within the vunnel repo are:

If at any point a breaking change needs to be made to a provider (and say the schema remains the same), then you can set the __version__ attribute on the provider class to a new integer value (incrementing from 1 onwards). This is a way to indicate that the cached input/results are not compatible with the output of the current version of the provider, in which case the next invocation of the provider will delete the previous input and results before running.

Provider configurations

Each provider has a configuration object defined next to the provider class. This object is used in the vunnel application configuration and is passed as input to the provider class. Take the debian provider configuration for example:

from dataclasses import dataclass, field

from vunnel import provider, result

@dataclass
class Config:
    runtime: provider.RuntimeConfig = field(
        default_factory=lambda: provider.RuntimeConfig(
            result_store=result.StoreStrategy.SQLITE,
            existing_results=provider.ResultStatePolicy.DELETE_BEFORE_WRITE,
        ),
    )
    request_timeout: int = 125

Every provider configuration must:

be a dataclass
have a runtime field that is a provider.RuntimeConfig field

The runtime field is used to configure common behaviors of the provider that are enforced within the vunnel.provider.Provider subclass. Options include:

on_error: what to do when the provider fails, sub fields include:
- action: choose to fail, skip, or retry when the failure occurs
- retry_count: the number of times to retry the provider before failing (only applicable when action is retry)
- retry_delay: the number of seconds to wait between retries (only applicable when action is retry)
- input: what to do about the input data directory on failure (such as keep or delete)
- results: what to do about the results data directory on failure (such as keep or delete)
existing_results: what to do when the provider is run again and the results directory already exists. Options include:
- delete-before-write: delete the existing results just before writing the first processed (new) result
- delete: delete existing results before running the provider
- keep: keep the existing results
existing_input: what to do when the provider is run again and the input directory already exists. Options include:
- delete: delete the existing input before running the provider
- keep: keep the existing input
result_store: where to store the results. Options include:
- sqlite: store results as key-value form in a SQLite database, where keys are the record identifiers values are the json vulnerability records
- flat-file: store results in JSON files named after the record identifiers

Any provider-specific config options can be added to the configuration object as needed (such as request_timeout, which is a common field).

Adding a new provider

“Vulnerability matching” is the process of taking a list of vulnerabilities and matching them against a list of packages. A provider in this repo is responsible for the “vulnerability” side of this process. The “package” side is handled by Syft. A prerequisite for adding a new provider is that Syft can catalog the package types that the provider is feeding vulnerability data for, so Grype can perform the matching from these two sources.

To add a new provider, you will need to create a new provider class under /src/vunnel/providers/<name> that inherits from provider.Provider and implements:

name(): a unique and semantically-useful name for the provider (same as the name of the directory)
update(): downloads and processes the raw data, writing all results with self.results_writer()

All results must conform to a particular schema, today there are a few kinds:

os: a generic operating system vulnerability (e.g redhat, debian, ubuntu, alpine, wolfi, etc.)
nvd: tailored to describe vulnerabilities from the NVD
github-security-advisory: tailored to describe vulnerabilities from GitHub
osv: tailored to describe vulnerabilities from the aggregated OSV vulnerability database

Once the provider is implemented, you will need to wire it up into the application in a couple places:

add a new entry under the dispatch table in src/vunnel/providers/__init__.py mapping your provider name to the class
add the provider configuration to the application configuration under src/vunnel/cli/config.py (specifically the Providers dataclass)

For a more detailed example on the implementation details of a provider see the “example” provider.

Validating this provider has different implications depending on what is being added. For example, if the provider is adding a new vulnerability source but is ultimately using an existing schema to express results then there may be very little to do! If you are adding a new schema, then the downstream data pipeline will need to be altered to support reading data in the new schema.

Please feel free to reach out to a maintainer on an incomplete draft PR and we can help you get it over the finish line!

…for an existing schema

1. Fork Vunnel and add the new provider.

Take a look at the example provider in the example directory. You are encouraged to copy example/awesome/* into src/vunnel/providers/YOURPROVIDERNAME/ and modify it to fit the needs of your new provider, however, this is not required:

# from the root of the vunnel repo
cp -a example/awesome src/vunnel/providers/YOURPROVIDERNAME

See the “example” provider README as well as the code comments for steps and considerations to take when implementing a new provider.

Once implemented, you should be able to see the new provider in the vunnel list command and run it with vunnel run <name>. The entries written should write out to a specific namespace in the DB downstream, as indicated in the record. This namespace is needed when making Grype changes.

While developing the provider consider using the make dev provider="<your-provider-name>"developer shell to run the provider and manually test the results against grype.

At this point you can optionally open a Vunnel PR with your new provider and a Maintainer can help with the next steps. Or if you’d like to get PR changes merged faster you can continue with the next steps.

2. Fork Grype and map distro type to a specific namespace.

This step might not be needed depending on the provider.

Common reasons for needing Grype changes include:

Grype does not support the distro type and it needs to be added. See the grype/distro/types.go file to add the new distro.
Grype supports the distro already, but matching is disabled. See the grype/distro/distro.go file to enable the distro explicitly.
There is a non-standard mapping of distro to namespaces (e.g. redhat and centos map to rhel). See the grype db schema namespace index for possible changes: https://github.com/anchore/grype/blob/main/grype/db/v5/namespace/index.go .

If you’re using the developer shell (make dev ...) then you can run make build-grype to get a build of grype with your changes.

3. In Vunnel: add a new test case to `tests/quality/config.yaml` for the new provider.

The configuration maps a provider to test to specific images to test with, for example:

---
- provider: amazon
  images:
    - docker.io/amazonlinux:2@sha256:1301cc9f889f21dc45733df9e58034ac1c318202b4b0f0a08d88b3fdc03004de
    - docker.io/anchore/test_images:vulnerabilities-amazonlinux-2-5c26ce9@sha256:cf742eca189b02902a0a7926ac3fbb423e799937bf4358b0d2acc6cc36ab82aa

These images are used to test the provider on PRs and nightly builds to verify the specific provider is working. Always use both the image tag and digest for all container image entries. Pick an image that has a good representation of the package types that your new provider is adding vulnerability data for.

4. In Vunnel: swap the tools to your Grype branch in `tests/quality/config.yaml`.

If you wanted to see PR quality gate checks pass with your specific Grype changes (if you have any) then you can update the yardstick.tools[*] entries for grype to use the a version that points to your fork (w.g. your-fork-username/grype@main). If you don’t have any grype changes needed then you can skip this step.

5. In Vunnel: add new “vulnerability match labels” to annotate True and False positive findings with Grype.

In order to evaluate the quality of the new provider, we need to know what the expected results are. This is done by annotating Grype results with “True Positive” labels (good results) and “False Positive” labels (bad results). We’ll use Yardstick to do this:

$ cd tests/quality

# capture results with the development version of grype (from your fork)
$ make capture provider=<your-provider-name>

# list your results
$ uv run yardstick result list | grep grype

d415064e-2bf3-4a1d-bda6-9c3957f2f71a  docker.io/anc...  grype@v0.58.0             2023-03...
75d1fe75-0890-4d89-a497-b1050826d9f6  docker.io/anc...  grype[custom-db]@bdcefd2  2023-03...

# use the "grype[custom-db]" result UUID and explore the results and add labels to each entry
$ uv run yardstick label explore 75d1fe75-0890-4d89-a497-b1050826d9f6

# You can use the yardstick TUI to label results:
# - use "T" to label a row as a True Positive
# - use "F" to label a row as a False Positive
# - Ctrl-Z to undo a label
# - Ctrl-S to save your labels
# - Ctrl-C to quit when you are done

Later we’ll open a PR in the vulnerability-match-labels repo to persist these labels. For the meantime we can iterate locally with the labels we’ve added.

6. In Vunnel: run the quality gate.

cd tests/quality

# runs your specific provider to gather vulnerability data, builds a DB, and runs grype with the new DB
make capture provider=<your-provider-name>

# evaluate the quality gate
make validate

This uses the latest Grype-DB release to build a DB and the specified Grype version with a DB containing only data from the new provider.

You are looking for a passing run before continuing further.

7. Open a vulnerability-match-labels repo PR to persist the new labels.

Vunnel uses the labels in the vulnerability-Match-Labels repo via a git submodule. We’ve already added labels locally within this submodule in an earlier step. To persist these labels we need to push them to a fork and open a PR:

# fork the github.com/anchore/vulnerability-match-labels repo, but you do not need to clone it...

# from the Vunnel repo...
$ cd tests/quality/vulnerability-match-labels

$ git remote add fork git@github.com:your-fork-name/vulnerability-match-labels.git
$ git checkout -b 'add-labels-for-<your-provider-name>'
$ git status

# you should see changes from the labels/ directory for your provider that you added

$ git add .
$ git commit -m 'add labels for <your-provider-name>'
$ git push fork add-labels-for-<your-provider-name>

At this point you can open a PR against in the vulnerability-match-labels repo.

Note: you will not be able to open a Vunnel PR that passes PR checks until the labels are merged into the vulnerability-match-labels repo.

Once the PR is merged in the vulnerability-match-labels repo you can update the submodule in Vunnel to point to the latest commit in the vulnerability-match-labels repo.

cd tests/quality

git submodule update --remote vulnerability-match-labels

8. In Vunnel: open a PR with your new provider.

The PR will also run all of the same quality gate checks that you ran locally.

If you have Grype changes, you should also create a PR for that as well. The Vunnel PR will not pass PR checks until the Grype PR is merged and the test/quality/config.yaml file is updated to point back to the latest Grype version.

…for a new schema

This is the same process as listed above with a few additional steps:

You will need to add the new schema to the Vunnel repo in the schemas directory.
Grype-DB will need to be updated to support the new schema in the pkg/provider/unmarshal and pkg/process/v* directories.
The Vunnel tests/quality/config.yaml file will need to be updated to use development grype-db.version, pointing to your fork.
The final Vunnel PR will not be able to be merged until the Grype-DB PR is merged and the tests/quality/config.yaml file is updated to point back to the latest Grype-DB version.

What might need refactoring?

Looking to help out with improving the code quality of Vunnel, but not sure where to start?

The best way is to look for issues with the refactor label.

More general ways would be to use radon to search for complexity and maintainability issues:

$ radon cc src --total-average -nb
src/vunnel/provider.py
    M 115:4 Provider._on_error - B
src/vunnel/providers/alpine/parser.py
    M 73:4 Parser._download - C
    M 178:4 Parser._normalize - C
    M 141:4 Parser._load - B
    C 44:0 Parser - B
src/vunnel/providers/amazon/parser.py
    M 66:4 Parser._parse_rss - C
    C 164:0 JsonifierMixin - C
    M 165:4 JsonifierMixin.json - C
    C 32:0 Parser - B
    M 239:4 PackagesHTMLParser.handle_data - B
...

The output of radon indicates the type (M=method, C=class, F=function), the path/name, and a A-F grade. Anything that’s not an A is worth taking a look at.

Another approach is to use wily:

$ wily build
...
$ wily rank
-----------Rank for Maintainability Index for bdb4983 by Alex Goodman on 2022-12-25.------------
╒═════════════════════════════════════════════════╤═════════════════════════╕
│ File                                            │   Maintainability Index │
╞═════════════════════════════════════════════════╪═════════════════════════╡
│ src/vunnel/providers/rhel/parser.py             │                 21.591  │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ src/vunnel/providers/ubuntu/parser.py           │                 21.6144 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ tests/unit/providers/github/test_github.py      │                 35.3599 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ tests/unit/utils/test_oval_v2.py                │                 36.3388 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ src/vunnel/providers/debian/parser.py           │                 37.3723 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ tests/unit/utils/test_fdb.py                    │                 38.6926 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ tests/unit/providers/sles/test_sles.py          │                 41.6602 │
├─────────────────────────────────────────────────┼─────────────────────────┤
│ tests/unit/providers/ubuntu/test_ubuntu.py      │                 43.1323 │
├─────────────────────────────────────────────────┼─────────────────────────┤
...

Ideally we should try to get wily diff output into the CI pipeline and post on a sticky PR comment to show regressions (and potentially fail the CI run).

Not everything has types

This codebase has been ported from another repo that did not have any type hints. This is OK, though ideally over time this should be corrected as new features are added and bug fixes made.

We use mypy today for static type checking, however, the ported code has been explicitly ignored (see pyproject.toml).

If you want to make enhancements in this area consider using automated tooling such as pytype to generate types via inference into .pyi files and later merge them into the codebase with merge-pyi.

Alternatively a tool like MonkeyType can be used generate static types from runtime data and incorporate into the code.

8 - Stereoscope

Developer guidelines when contributing to Stereoscope

We welcome contributions to the project! There are a few useful things to know before diving into the codebase.

Do also take note of the General Guidelines that apply accross all Anchore Open Source projects.

Getting started

In order to test and develop in this repo you will need the following dependencies installed:

Golang
docker
make
podman (for benchmark and integration tests only)
containerd (for integration tests only)
skopeo (for integration tests only)

After cloning the following step can help you get setup:

run make bootstrap to download go mod dependencies, create the /.tmp dir, and download helper utilities.
run make help to view the selection of developer commands in the Makefile

The main make tasks for common static analysis and testing are lint, format, lint-fix, unit, and integration.

See make help for all the current make tasks.

Background

Stereoscope is a library for reading and manipulating container images. It is capable of parsing multiple image sources, providing a single abstraction for interacting with them. Ultimately this provides a squashfs-like interface for interacting with image layers as well as a content API for accessing files contained within the image.

Overview of objects:

image.Image: Once parsed with image.Read() this object represents a container image. Consists of a sequence of image.Layer objects, a image.FileCatalog for accessing files, and filetree.SearchContext for searching for files from the squashed representation of the image filesystem. Additionally exposes GGCR v1.Image objects for accessing the raw image metadata.
image.Layer: represents a single layer of the image. Consists of a filetree.FileTree that represents the raw layer contents, and a filetree.SearchContext for searching for files relative to the raw (single layer) filetree as well as the squashed representation of the layer relative to all layers below this one. Additionally exposes GGCR v1.Layer objects for accessing the raw layer metadata.
filetree.FileTree: a tree representing a filesystem. All nodes represent real paths (paths with no link resolution anywhere in the path) and are absolute paths (start with / and contain no relative path elements [e.g. ../ or ./]). This represents the filesystem structure and each node has a reference to the file metadata for that path.
file.Reference: a unique file in the filesystem, identified by an absolute, real path as well as an integer ID (file.IDs). These are used to reference concrete nodes in the filetree.FileTree and image.FileCatalog objects.
file.Index: stores all known file.Reference and file.Metadata. Entries are indexed with a variety of ways to provide fast access to references and metadata without needing to crawl the tree. This is especially useful for speeding up globbing.
image.FileCatalog: an image-aware extension of file.Index that additionally relates image.Layers to file.IDs and provides a content API for any files contained within the image (regardless of which layer or squashed representation it exists in).

Searching for files

Searching for files is exposed to users in three ways:

search by file path
search by file glob
search by file content MIME type

Searching itself is performed two different ways:

search the image.FileCatalog on the image by a heuristic
search the filetree.FileTree directly

The “best way” to search is automatically determined in the filetree.searchContext object, exposed on image.Image and image.Layer objects as a filetree.Searcher for general use.

File trees

The filetree.FileTree object represents a filesystem and consists of filenode.Node objects. The tree itself leverages tree.Tree as a generic datastructure. What filetree.FileTree adds is the concept of file types, the semantics of each type, the ability to resolve links based on a given strategy, merging of trees with the same semantics of a union filesystem (e.g. whiteout files), and the ability to search for files via direct paths or globs.

The fs.FS abstraction has been implemented on filetree.FileTree to allow for easy integration with the standard library as well as to interop with the doublestar library to facilitate globing. Using the fs.FS abstraction for filetree operations is faster than OS interactions with the filesystem directly but relatively slower than the indexes provided by image.FileCatalog and file.Index.

filetree.FileTree objects can be created with a corresponding file.Index object by leveraging the filetree.Builder object, which aids in the indexing of files.

Contributing

Anchore OSS Contribution Guidelines

Tool-Specific Guides

User facing tools

Automation tools

Backend tools & libraries

General Guidelines

Sign off your work

Test your changes

Pull Request

PR Title and Description

What to expect next

Document your changes

1 - Syft

Getting started

Docker settings for getting started

Internal Artifactory Settings

Not always applicable

Architecture

pkg.Package object

Syft Catalogers

Building a new Cataloger

Searching for files

Testing

Testing commands

Levels of testing

Data diversity and freshness assertions

Snapshot tests

2 - Grype

Getting started

Relationship to Syft

Inspecting the database

3 - Grant

Getting Started

Testing

Linting

Making a PR

4 - Grype-DB

Getting started

Getting an initial vulnerability data cache

Running tests

Create a new DB schema

Making a staging DB

Architecture

DB structure and definitions

DB listing file

Getting a grype DB out to OSS users (daily)

Daily data sync workflow

Daily DB publishing workflow

5 - SBOM Action

TODO

6 - Scan Action

TODO

7 - Vunnel

Getting Started

Developing

Snapshot Tests

Architecture

Vunnel Providers

Provider configurations

Adding a new provider

…for an existing schema

1. Fork Vunnel and add the new provider.

2. Fork Grype and map distro type to a specific namespace.

3. In Vunnel: add a new test case to tests/quality/config.yaml for the new provider.

4. In Vunnel: swap the tools to your Grype branch in tests/quality/config.yaml.

5. In Vunnel: add new “vulnerability match labels” to annotate True and False positive findings with Grype.

6. In Vunnel: run the quality gate.

7. Open a vulnerability-match-labels repo PR to persist the new labels.

8. In Vunnel: open a PR with your new provider.

…for a new schema

What might need refactoring?

Not everything has types

8 - Stereoscope

Getting started

Background

Searching for files

File trees

`pkg.Package` object

3. In Vunnel: add a new test case to `tests/quality/config.yaml` for the new provider.

4. In Vunnel: swap the tools to your Grype branch in `tests/quality/config.yaml`.