Kirill Dubovikov kdubovikov

## README_hfd.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                kdubovikov
                / README_hfd.md
            
            
              Last active
              March 11, 2026 05:54
                — forked from padeoe/README_hfd.md
            
              
                CLI-Tool for download Huggingface models and datasets with aria2/wget: hfd
              
          
    🤗 Hugging Face Downloader

Note
(2026-03-10) Ported CAMD reliability updates: retry/backoff controls, network timeout controls, and proxy-prefix fallback support.
(2025-01-08) Added revision selection via --revision, contributed by @Bamboo-D.
(2024-12-17) Switched to the metadata/file-list flow for faster startup and resume without git clone.

This script downloads models and datasets from Hugging Face with curl plus either aria2c or wget. It keeps the repo's fast metadata-based startup and resume behavior, and now also includes the CAMD script's retry/backoff and timeout controls.
Features


## chopstick-service.yml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: chopstick-classifier
  name: chopstick-classifier
spec:
  type: LoadBalancer # we need a load balancer to distribute our requests between all replicas in the Deployment
  ports:
  - port: 8500  # run this

## chopstick_deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chopstick-classifier
spec:
  replicas: 3 # Here we tell Kubernetes to keep 3 replicas up. This will help us with reliability and scalability. The value can be changed online and Kubernetes will update the number of replicas to required quantity
  selector:
    matchLabels: # all pods labeled "app: chopstick-classifier will be in scope of this deployment"
      app: chopstick-classifier
  template: # this template will be allied to each replica in the set

## tf_serving_client.py
from __future__ import absolute_import, division, print_function

import argparse

import tensorflow as tf
from colorama import Back, Fore, Style, init
from grpc.beta import implementations

from tensorflow_serving.apis import predict_pb2, prediction_service_pb2

## tf_serving_estimator_client.py
from __future__ import absolute_import, division, print_function

import argparse

import tensorflow as tf
from colorama import Back, Fore, Style, init
from grpc.beta import implementations

from tensorflow_serving.apis import classification_pb2, prediction_service_pb2

## chopstick_classifier_tf_api.py
"""Train chopstick efficiency classifier and export it as Servable"""
import argparse

import numpy as np
import pandas as pd
import tensorflow as tf


def preprocess_data(filename, val_rows_count=20):
    """Load data and split it to train and validation sets"""

## chopstick_classifier_estimator_api.py
"""Train chopstick efficiency classifier and export it as Servable"""
import argparse

import pandas as pd
import tensorflow as tf


def preprocess_data(filename, val_rows_count=20):
    """Load data and split it to train and validation sets"""

## plot_sampling_comparison.py
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(12,8),sharey=True)
sns.despine(left=True)
sns.distplot(choice_num, ax=ax)
sns.distplot(fast_num, ax=ax)
ax.set_title("Collision count distributions")

## random_samplig_t_test.py
from scipy.stats import ttest_rel

# We must do depentent t-test since all statistics were collected from the
# same array
ch_fast_test = ttest_rel(choice_num, fast_num)
print(f"choice sampling vs fast sampling significant: {ch_fast_test.pvalue < 0.05}, p = {ch_fast_test.pvalue}")

## collect_collisions.py
def collect_collisions(arr, sample_size, n_samples,
                       fast=False):
    """Collect total number of collisions made for each sample of arr.

    Parameters
    ----------
    arr: np.array
        array to sample from.
    sample_size: int
        sample size.
	apiVersion: v1
	kind: Service
	metadata:
	labels:
	app: chopstick-classifier
	name: chopstick-classifier
	spec:
	type: LoadBalancer # we need a load balancer to distribute our requests between all replicas in the Deployment
	ports:
	- port: 8500 # run this
	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: chopstick-classifier
	spec:
	replicas: 3 # Here we tell Kubernetes to keep 3 replicas up. This will help us with reliability and scalability. The value can be changed online and Kubernetes will update the number of replicas to required quantity
	selector:
	matchLabels: # all pods labeled "app: chopstick-classifier will be in scope of this deployment"
	app: chopstick-classifier
	template: # this template will be allied to each replica in the set
	from __future__ import absolute_import, division, print_function

	import argparse

	import tensorflow as tf
	from colorama import Back, Fore, Style, init
	from grpc.beta import implementations

	from tensorflow_serving.apis import predict_pb2, prediction_service_pb2
	"""Train chopstick efficiency classifier and export it as Servable"""
	import argparse

	import numpy as np
	import pandas as pd
	import tensorflow as tf


	def preprocess_data(filename, val_rows_count=20):
	"""Load data and split it to train and validation sets"""
	import seaborn as sns
	import matplotlib.pyplot as plt
	%matplotlib inline

	fig, ax = plt.subplots(figsize=(12,8),sharey=True)
	sns.despine(left=True)
	sns.distplot(choice_num, ax=ax)
	sns.distplot(fast_num, ax=ax)
	ax.set_title("Collision count distributions")
	from scipy.stats import ttest_rel

	# We must do depentent t-test since all statistics were collected from the
	# same array
	ch_fast_test = ttest_rel(choice_num, fast_num)
	print(f"choice sampling vs fast sampling significant: {ch_fast_test.pvalue < 0.05}, p = {ch_fast_test.pvalue}")
	def collect_collisions(arr, sample_size, n_samples,
	fast=False):
	"""Collect total number of collisions made for each sample of arr.

	Parameters
	----------
	arr: np.array
	array to sample from.
	sample_size: int
	sample size.