How does Large File Support LFS work in Gitlab
How does Large File Support LFS work in Gitlab
add and track a 3.7GB tar file in a repo, and push it:
git lfs track “*.tar”
cp <a folder>/a.tar .
git add a.tar
git commit -m “add a.tar”
git push origin master
Question 1: at the end of this process, has a.tar been uploaded on the gitlab server ? It is unclear as the “add” and the “commit” commands took some time (maybe not long enough to let me wonder if the 3.7GB were uploaded during that time) but the push did not take any time at all (a fraction of second).
Answer 1 As explained in this video (at 1:27), when you push a file tracked by git lfs it is intercepted and placed on a different server, leaving a pointer in your git repository. As you see in the reference you provide in Question 4, this worked for you.
Question 2: if the file was uploaded on the server, where ? Obviously not in the same place as the repo (that is the point). I asked because my server is being backed-up, and I need to know if the use of git-lfs requires me to update this in any way.
Answer 2 This is a bit more tricky. Reading the documentation for git lfs smudge, we have:
Read a Git LFS pointer file from standard input and write the contents of the corresponding large file to standard output. If needed, download the file’s contents from the Git LFS endpoint. The argument, if provided, is only used for a progress bar.
The git lfs endpoint can be found from the output of git lfs env. My “endpoint” is a folder under (but not in) my repository, which makes me think that GitLab creates a git repository on the server in our account space to store binary files.
That said, I don’t know how you’d go about backing this up. GitHub provides a git lfs server that’s “not in a production ready state,” so it’d require some work on your part to set it up such that your binary files are uploaded to a server you administer. If backing up these files is a priority and you don’t want to use one of the implementations (Amazon S3, etc), you might try another binary file storage system that works with git, such as git-media, git-annex, git-fat, git-bigstore…. I haven’t looked into these options in depth, so couldn’t make a recommendation.
Question 3: if the file was not uploaded, does this mean other users of the repo will get the link to the file on the original machine on which the file was added ? Is there a way to change this to a location on the server ? (back to question 2)
Answer 3 If the file was not uploaded using git lfs it would have been pushed using git and you’d have a binary file in your git repository. But, yours was uploaded using git lfs as you say in Question 4.
Question 4: after cloning the repo, indeed the full 3.4G file is not there, “just” a text file with the content:
version https://git-lfs.github.com/spec/v1
oid sha256:4bd049d85f06029d28bd94eae6da2b6eb69c0b2d25bac8c30ac1b156672c4082
size 3771098624
Answer 4 Other users of your repository, after having installed git lfs on their local machines, can simply type git lfs pull to bring in the binary file(s) that you pushed using git lfs
SOURCE: https://stackoverflow.com/questions/34181356/git-lfs-where-are-the-file-stored-how-to-get-them
It happens with the best of intentions: your design team adds their large graphic files to your project repository – and you see it grow and grow until it’s a multi-gigabyte clump…
Working with large binary files in Git can indeed be tricky. Every time a tiny change in a 100 MB Photoshop file is committed, your repository grows by another 100 MB. This quickly adds up and makes your repository almost unusable due to its enormous size.
But of course, not using version control for your design / concept / movie / audio / executables / <other-large-file-use-case> work cannot be the solution. The general benefits of version control still apply and should be reaped in all kinds of projects.
Luckily, there’s a Git extension that makes working with large files a lot more efficient: say hello to “Large File Storage” (or simply “LFS” if you prefer nicknames).
Without LFS: Bloated Repositories
Before we look at how exactly LFS works its wonders, we’ll take a closer look at the actual problem. Let’s consider a simple website project as an example:
Nothing special: some HTML, CSS, and JS files and a couple of small image assets. However, until now, we haven’t included our design assets (Photoshop, Sketch, etc.). It makes a lot of sense to put your design assets under version control, too.
However, here’s the catch: each time our designer makes a change (no matter how small) to this new Photoshop file, she will commit another 100 MB to the repository. Very quickly, the repository will weigh tons of megabytes and soon gigabytes – which makes cloning and managing it very tedious.
Although I only talked about “design” files, this is really a problem with all “large” files: movies, audio recordings, datasets, etc.
With LFS: Efficient Large File Handling
Of course, LFS cannot simply “magic away” all that large data: it accrues with every change and has to be saved. However, it shifts that burden to the remote server – allowing the local repository to stay relatively lean!
To make this possible, LFS uses a simple trick: it does not keep all of a file’s versions in the local repository. Instead, it provides only the files that are necessary in the checked out revision, on demand.
But this poses an interesting question: if those huge files themselves are not present in your local repository… what is present instead? LFS saves lightweight pointers in place of real file data. When you check out a revision with such a pointer, LFS simply looks up the original file (possibly on the server if it’s not in its own, special cache) and downloads it for you.
Thereby, you end up with only the files you really want – not a whole bunch of superfluous data that you might never need.
Installing LFS
LFS is not (yet) part of the core Git binary, but it’s available as an extension. This means that, before we can work with LFS, we need to make sure it’s installed.
Server
Not all code hosting services support LFS already. As a GitLab user, however, there’s not much to worry about: if you’re using GitLab.com or a halfway recent version of GitLab CE or EE, support for LFS is already baked in! Your administrator only need to enable the LFS option.
Local Machine
Your local Git installation also needs to support LFS. If you’re using Tower, a Git desktop client, you don’t have to install anything: Tower supports the Git Large File System out of the box.
If you’re using Git on the command line, there are different installation options available to you:
- Binary Packages: Up-to-date binary packages are available for Windows, Mac, Linux, and FreeBSD.
- Linux: Packages for Debian and RPM are available from PackageCloud.
- macOS: You can use Homebrew via “brew install git-lfs” or MacPorts via “port install git-lfs”.
- Windows: You can use the Chocolatey package manager via “choco install git-lfs”.
After your package manager has finished its work, you need to complete the installation with the “lfs install” command:
git lfs install
Tracking Files with LFS
Without further instructions, LFS won’t take care of your large file problems. We’ll have to tell LFS explicitly which files it should handle!
So let’s return to our “big Photoshop file” example. We can instruct LFS to take care of the “design.psd” file using the “lfs track” command:
git lfs track “design-resources/design.psd”
At first glance, the command didn’t seem to have much effect. However, you’ll notice that a new file in the project’s root folder has been created (or changed, if it already existed): .gitattributes collects all file patterns that we choose to track via LFS. Let’s take a look at its contents:
cat .gitattributes
design-resources/design.psd filter=lfs diff=lfs merge=lfs -text
Perfect! From now on, LFS will handle this file. We can now go ahead and add it to the repository in the way we’re used to. Notice that any changes to .gitattributes also have to be committed to the repository, just like other modifications:
git add .gitattributes
git add design-resources/design.psd
git commit -m “Add design file”
Tracking File Patterns
Adding a specific, single file like this is all well and good… but what if you want to track, for example, every .indd file in our project? Please relax: you don’t have to add each file manually! LFS allows you to define file patterns, much like when ignoring files. The following command, for example, will instruct LFS to track all InDesign files – existing ones and future ones:
git lfs track “*.indd”
You could also tell LFS to track the contents of a whole directory:
git lfs track “design-assets/*”
Getting an Overview of Tracked Files
At some point, you might want to know which files exactly are tracked by LFS at the moment. You could simply take a look at the .gitattributes file. However, these are not actual files, but only rules and therefore highly “theoretical”: individual files might have slipped through, e.g. due to typos or overly restrictive rules.
To see a list of the actual files that you’re currently tracking, simply use the git lfs ls-files command:
git lfs ls-files
194dcdb603 * design-resources/design.psd
Track as Early as Possible
Remember that LFS does not change the laws of nature: things that were committed to the repository are there to stay. It’s very hard (and dangerous) to change a project’s commit history.
This means that you should tell LFS to track a file before it’s committed to the repository.
Otherwise, it has become part of your project’s history – including all of its megabytes and gigabytes…
The ideal moment to configure which file patterns you want to track is right when initializing a repository (just like with ignoring files).
Using LFS in a GUI
Although LFS is not difficult to use, there are still commands to remember and things to mess up. If you want to be more productive with Git (and LFS), have a look at Tower, a Git desktop client for Mac and Windows. Since Tower comes with built-in support for Git LFS, there is nothing to install. The app has been around for several years and is trusted by over 80,000 users all over the world.
Additionally, Tower provides a direct integration with GitLab! After connecting your GitLab account in Tower, you can clone and create repositories with just a single click.
Working with Git
A great aspect of LFS is that you can maintain your normal Git workflow: staging, committing, pushing, pulling and everything else works just like before. Apart from the commands we’ve discussed, there’s nothing to watch out for.
LFS will provide the files you need, when you need them.
In case you’re looking for more information about LFS, have a look at this free online book. For general insights about Git, take a look at the Git Tips & Tricks blog post and Tower’s video series.
SOURCE: https://about.gitlab.com/2017/01/30/getting-started-with-git-lfs-tutorial/
Git LFS
Managing large files such as audio, video and graphics files has always been one of the shortcomings of Git. The general recommendation is to not have Git repositories larger than 1GB to preserve performance.
How it works
Git LFS client talks with the GitLab server over HTTPS. It uses HTTP Basic Authentication to authorize client requests. Once the request is authorized, Git LFS client receives instructions from where to fetch or where to push the large file.
GitLab server configuration
Documentation for GitLab instance administrators is under LFS administration doc.
Requirements
- Git LFS is supported in GitLab starting with version 8.2
- Git LFS must be enabled under project settings
- Git LFS client version 1.0.1 and up
Known limitations
- Git LFS v1 original API is not supported since it was deprecated early in LFS development
- When SSH is set as a remote, Git LFS objects still go through HTTPS
- Any Git LFS request will ask for HTTPS credentials to be provided so a good Git credentials store is recommended
- Git LFS always assumes HTTPS so if you have GitLab server on HTTP you will have to add the URL to Git config manually (see troubleshooting)
Note: With 8.12 GitLab added LFS support to SSH. The Git LFS communication still goes over HTTP, but now the SSH client passes the correct credentials to the Git LFS client, so no action is required by the user.
Using Git LFS
Lets take a look at the workflow when you need to check large files into your Git repository with Git LFS. For example, if you want to upload a very large file and check it into your Git repository:
git clone git@gitlab.example.com:group/project.gitgit lfs install # initialize the Git LFS projectgit lfs track “*.iso” # select the file extensions that you want to treat as large files
Once a certain file extension is marked for tracking as a LFS object you can use Git as usual without having to redo the command to track a file with the same extension:
cp ~/tmp/debian.iso ./ # copy a large file into the current directorygit add . # add the large file to the projectgit commit -am “Added Debian iso” # commit the file meta datagit push origin master # sync the git repo and large file to the GitLab server
Note: Make sure that .gitattributes is tracked by git. Otherwise Git LFS will not be working properly for people cloning the project.
git add .gitattributes
Cloning the repository works the same as before. Git automatically detects the LFS-tracked files and clones them via HTTP. If you performed the git clone command with a SSH URL, you have to enter your GitLab credentials for HTTP authentication.
git clone git@gitlab.example.com:group/project.git
If you already cloned the repository and you want to get the latest LFS object that are on the remote repository, eg. from branch master:
git lfs fetch master
Troubleshooting
error: Repository or object not found
There are a couple of reasons why this error can occur:
- You don’t have permissions to access certain LFS object
Check if you have permissions to push to the project or fetch from the project.
- Project is not allowed to access the LFS object
LFS object you are trying to push to the project or fetch from the project is not available to the project anymore. Probably the object was removed from the server.
- Local git repository is using deprecated LFS API
Invalid status for <url> : 501
Git LFS will log the failures into a log file. To view this log file, while in project directory:
git lfs logs last
If the status error 501 is shown, it is because:
- Git LFS is not enabled in project settings. Check your project settings and enable Git LFS.
- Git LFS support is not enabled on the GitLab server. Check with your GitLab administrator why Git LFS is not enabled on the server. See LFS administration documentation for instructions on how to enable LFS support.
- Git LFS client version is not supported by GitLab server. Check your Git LFS version with git lfs version. Check the Git config of the project for traces of deprecated API with git lfs -l. If batch = false is set in the config, remove the line and try to update your Git LFS client. Only version 1.0.1 and newer are supported.
getsockopt: connection refused
If you push a LFS object to a project and you receive an error similar to: Post <URL>/info/lfs/objects/batch: dial tcp IP: getsockopt: connection refused, the LFS client is trying to reach GitLab through HTTPS. However, your GitLab instance is being served on HTTP.
This behaviour is caused by Git LFS using HTTPS connections by default when a lfsurl is not set in the Git config.
To prevent this from happening, set the lfs url in project Git config:
git config –add lfs.url “http://gitlab.example.com/group/project.git/info/lfs”
Credentials are always required when pushing an object
Note: With 8.12 GitLab added LFS support to SSH. The Git LFS communication still goes over HTTP, but now the SSH client passes the correct credentials to the Git LFS client, so no action is required by the user.
Given that Git LFS uses HTTP Basic Authentication to authenticate the user pushing the LFS object on every push for every object, user HTTPS credentials are required.
By default, Git has support for remembering the credentials for each repository you use. This is described in Git credentials man pages.
For example, you can tell Git to remember the password for a period of time in which you expect to push the objects:
git config –global credential.helper ‘cache –timeout=3600’
This will remember the credentials for an hour after which Git operations will require re-authentication.
If you are using OS X you can use osxkeychain to store and encrypt your credentials. For Windows, you can use wincred or Microsoft’s Git Credential Manager for Windows.
More details about various methods of storing the user credentials can be found on Git Credential Storage documentation.
SOURCE: https://docs.gitlab.com/ee/workflow/lfs/manage_large_binaries_with_git_lfs.html
GitLab Git LFS Administration
Documentation on how to use Git LFS are under Managing large binary files with Git LFS doc.
Requirements
- Git LFS is supported in GitLab starting with version 8.2.
- Users need to install Git LFS client version 1.0.1 and up.
Configuration
Git LFS objects can be large in size. By default, they are stored on the server GitLab is installed on.
There are two configuration options to help GitLab server administrators:
- Enabling/disabling Git LFS support
- Changing the location of LFS object storage
Omnibus packages
In /etc/gitlab/gitlab.rb:
gitlab_rails[‘lfs_enabled’] = false # Optionally, change the storage path location. Defaults to# `#{gitlab_rails[‘shared_path’]}/lfs-objects`. Which evaluates to# `/var/opt/gitlab/gitlab-rails/shared/lfs-objects` by default.gitlab_rails[‘lfs_storage_path’] = “/mnt/storage/lfs-objects”
Installations from source
In config/gitlab.yml:
lfs: enabled: false storage_path: /mnt/storage/lfs-objects
Storage statistics
You can see the total storage used for LFS objects on groups and projects in the administration area, as well as through the groups and projects APIs.
Known limitations
- Currently, storing GitLab Git LFS objects on a non-local storage (like S3 buckets) is not supported
- Support for removing unreferenced LFS objects was added in 8.14 onwards.
- LFS authentications via SSH was added with GitLab 8.12
- Only compatible with the GitLFS client versions 1.1.0 and up, or 1.0.2.
- The storage statistics currently count each LFS object multiple times for every project linking to it
SOURCE: https://docs.gitlab.com/ee/workflow/lfs/lfs_administration.html