Skip to content

Instantly share code, notes, and snippets.

@kdubovikov
kdubovikov / README_hfd.md
Last active March 11, 2026 05:54 — forked from padeoe/README_hfd.md
CLI-Tool for download Huggingface models and datasets with aria2/wget: hfd

🤗 Hugging Face Downloader

Note

(2026-03-10) Ported CAMD reliability updates: retry/backoff controls, network timeout controls, and proxy-prefix fallback support. (2025-01-08) Added revision selection via --revision, contributed by @Bamboo-D. (2024-12-17) Switched to the metadata/file-list flow for faster startup and resume without git clone.

This script downloads models and datasets from Hugging Face with curl plus either aria2c or wget. It keeps the repo's fast metadata-based startup and resume behavior, and now also includes the CAMD script's retry/backoff and timeout controls.

Features

apiVersion: v1
kind: Service
metadata:
labels:
app: chopstick-classifier
name: chopstick-classifier
spec:
type: LoadBalancer # we need a load balancer to distribute our requests between all replicas in the Deployment
ports:
- port: 8500 # run this
apiVersion: apps/v1
kind: Deployment
metadata:
name: chopstick-classifier
spec:
replicas: 3 # Here we tell Kubernetes to keep 3 replicas up. This will help us with reliability and scalability. The value can be changed online and Kubernetes will update the number of replicas to required quantity
selector:
matchLabels: # all pods labeled "app: chopstick-classifier will be in scope of this deployment"
app: chopstick-classifier
template: # this template will be allied to each replica in the set
from __future__ import absolute_import, division, print_function
import argparse
import tensorflow as tf
from colorama import Back, Fore, Style, init
from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2
from __future__ import absolute_import, division, print_function
import argparse
import tensorflow as tf
from colorama import Back, Fore, Style, init
from grpc.beta import implementations
from tensorflow_serving.apis import classification_pb2, prediction_service_pb2
"""Train chopstick efficiency classifier and export it as Servable"""
import argparse
import numpy as np
import pandas as pd
import tensorflow as tf
def preprocess_data(filename, val_rows_count=20):
"""Load data and split it to train and validation sets"""
@kdubovikov
kdubovikov / chopstick_classifier_estimator_api.py
Created February 24, 2018 17:20
Chopstick Classifier using TensorFlow Estimator API
"""Train chopstick efficiency classifier and export it as Servable"""
import argparse
import pandas as pd
import tensorflow as tf
def preprocess_data(filename, val_rows_count=20):
"""Load data and split it to train and validation sets"""
@kdubovikov
kdubovikov / plot_sampling_comparison.py
Created October 8, 2017 13:00
Compare sampling algorithms - plot distributions
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(12,8),sharey=True)
sns.despine(left=True)
sns.distplot(choice_num, ax=ax)
sns.distplot(fast_num, ax=ax)
ax.set_title("Collision count distributions")
@kdubovikov
kdubovikov / random_samplig_t_test.py
Created October 8, 2017 12:52
Compare sampling algorithms with t-test
from scipy.stats import ttest_rel
# We must do depentent t-test since all statistics were collected from the
# same array
ch_fast_test = ttest_rel(choice_num, fast_num)
print(f"choice sampling vs fast sampling significant: {ch_fast_test.pvalue < 0.05}, p = {ch_fast_test.pvalue}")
@kdubovikov
kdubovikov / collect_collisions.py
Last active October 8, 2017 12:58
Compare sampling algorithms
def collect_collisions(arr, sample_size, n_samples,
fast=False):
"""Collect total number of collisions made for each sample of arr.
Parameters
----------
arr: np.array
array to sample from.
sample_size: int
sample size.