Introducing unified glob-syntax in git-pandas

In an effort to improve the user interface to git-pandas, I'm introducing a new way of specifying which files in a repository you care about, which will become the sole way of specifying this kind of thing in version 2.0.0.  Currently, for any given function, you can specify a list of extensions you'd like to include, and a list of directories you'd like to exclude.  A toy example would be:

df = repo.some_method(
    extensions=['py', 'md', 'html'],
    ignore_dir=['docs', 'tests']
)

With this, you'd be looking at python, markdown and HTML files other than those in the docs or tests directories.

For most use-cases, this is a pretty compact way to express what you need.  But those little edge cases come up increasingly often where a more flexible method is needed. So now you can alternatively pass a parameter to any of these methods called ignore_globs:

df = repo.some_method(
    ignore_globs=[
       'docs*',
       'tests*'
    ]
)

Which will ignore any file that matches any single glob passed into that object.  For now, this is limiting, because things like file extension matching for multiple file types is really hard to do in this notation.  To simplify that, the next addition will be a corresponding 'include_globs'.  A full description of that is in the github issue found here: https://github.com/wdm0006/git-pandas/issues/3

Once added, the logic will be that for any given file, at least one pattern in the include_globs must match, and at most zero patterns from exclude_globs must match.

In that way, we could replicate the first example above with:

df = repo.some_method(
    ignore_globs=[
       'docs*',
       'tests*'
    ],
   include_globs=[
       '*.py',
       '*.md',
       '*.html'
   ]
)

Which is about the same level of verbosity, and far more flexible.

If you're a user of git-pandas, I would love your feedback on this.  If you're a developer, I'd love your help.  Any feedback is always welcome on the issue posted above.

 

Will

Will has a background in Mechanical Engineering from Auburn, but mostly just writes software now. He was the first employee at Predikto, and is currently building out the premiere platform for predictive maintenance in heavy industry there as Chief Scientist. When not working on that, he is generally working on something related to python, data science or cycling.

2 Comments

Leave a Reply