Explicit count of the number of items.
For lazy or distributed data, will force a computation.
Return the first element.
Convert to Blocks, each representing a subdivision of the larger Images data.
-
sizestror tuple of block size per dimension,String interpreted as memory size (in megabytes, e.g. "64"). Tuple of ints interpreted as "pixels per dimension". Only valid in spark mode.
Converts this Images object to a TimeSeries object.
This method is equivalent to images.asBlocks(size).asSeries().asTimeSeries().
-
sizestring memory sizeoptionaldefault = "150M"String interpreted as memory size (e.g. "64M").
-
unitsstringeither "pixels" or "splits"default = "pixels"What units to use for a tuple size.
Converts this Images object to a Series object.
This method is equivalent to images.toblocks(size).toSeries().
-
sizestring memory sizeoptionaldefault = "150M"String interpreted as memory size (e.g. "64M").
Convert to local representation.
Convert to spark representation.
Execute a function on each image
Extract random sample of series.
-
nsamplesintoptionaldefault = 100The number of data points to sample.
-
seedintoptionaldefault = NoneRandom seed.
Map an array -> array function over each image
Filter images
Reduce over images
Compute the mean across images
Compute the variance across images
Compute the standard deviation across images
Compute the sum across images
Compute the max across images
Compute the min across images
Remove single-dimensional axes from images.
Compute maximum projections of images / volumes along the specified dimension.
-
axisintoptionaldefault = 2Which axis to compute projection along
Compute maximum-minimum projections of images / volumes along the specified dimension. This computes the sum of the maximum and minimum values along the given dimension.
-
axisintoptionaldefault = 2Which axis to compute projection along
Downsample an image volume by an integer factor
-
sample_factorpositive int or tuple of positive intsStride to use in subsampling. If a single int is passed, each dimension of the image will be downsampled by this same factor. If a tuple is passed, it must have the same dimensionality of the image. The strides given in a passed tuple will be applied to each image dimension.
Spatially smooth images with a gaussian filter.
Filtering will be applied to every image in the collection.
parameters
-
sigmascalar or sequence of scalarsdefault=2Size of the filter size as standard deviation in pixels. A sequence is interpreted as the standard deviation for each axis. A single scalar is applied equally to all axes.
-
orderchoice of 0 / 1 / 2 / 3 or sequence from same setoptionaldefault = 0Order of the gaussian kernel, 0 is a gaussian, higher numbers correspond to derivatives of a gaussian.
Spatially filter images using a uniform filter.
Filtering will be applied to every image in the collection.
parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.
Spatially filter images using a median filter.
Filtering will be applied to every image in the collection.
parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.
Correlate every pixel to the average of its local neighborhood.
This algorithm computes, for every spatial record, the correlation coefficient between that record's series, and the average series of all records within a local neighborhood with a size defined by the neighborhood parameter. The neighborhood is currently required to be a single integer, which represents the neighborhood size in both x and y.
-
neighborhoodintoptionaldefault=2Size of the correlation neighborhood (in both the x and y directions), in pixels.
Subtract a constant value or an image / volume from all images / volumes in the data set.
-
valintfloator ndarrayValue to subtract
Write 2d or 3d images as PNG files.
Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.
-
pathstringPath to output directory, must be one level below an existing directory.
-
prefixstringString to prepend to filenames.
-
overwriteboolIf true, the directory given by path will first be deleted if it exists.
Write 2d or 3d images as TIF files.
Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.
-
pathstringPath to output directory, must be one level below an existing directory.
-
prefixstringString to prepend to filenames.
-
overwriteboolIf true, the directory given by path will first be deleted if it exists.
Write out images or volumes as flat binary files.
Files will be written into a newly-created directory.
-
pathstringPath to output directory, must be one level below an existing directory.
-
prefixstringString to prepend to filenames.
-
overwriteboolIf true, the directory given by path will first be deleted if it exists.
Efficiently apply a function to each time series
Applies a function to each time series without transforming all the way to a Series object, but using a Blocks object instead for increased efficiency in the transformation back to Images.
-
funcfunctionFunction to apply to each time series. Should take one-dimensional ndarray and return the transformed one-dimensional ndarray.
-
value_sizeintoptionaldefault=NoneSize of the one-dimensional ndarray resulting from application of func. If not supplied, will be automatically inferred for an extra computational cost.
-
block_sizestror tuple of block size per dimension,String interpreted as memory size (in megabytes e.g. "64"). Tuple of ints interpreted as "pixels per dimension".
Reshape all dimensions but the last into a single dimension
Explicit count of the number of items.
For lazy or distributed data, will force a computation.
Return the first element.
Convert to local representation.
Convert to spark representation.
Extract random sample of series.
-
nsamplesintoptionaldefault = 100The number of data points to sample.
-
seedintoptionaldefault = NoneRandom seed.
Map an array -> array function over each series
Filter by applying a function to each series.
Reduce over series.
Compute the mean across images
Compute the variance across images
Compute the standard deviation across images
Compute the sum across images
Compute the max across images
Compute the min across images
Select subset of values within the given index range
Inclusive on the left; exclusive on the right.
-
leftintLeft-most index in the desired range
right: int Right-most index in the desired range
Select subset of values that match a given index criterion
-
critfunctionliststrintCriterion function to map to indices, specific index value, or list of indices
Center series data by subtracting the mean either within or across records
-
axisintoptionaldefault = 0Which axis to center along, within (1) or across (0) records
Standardize series data by dividing by the standard deviation either within or across records
-
axisintoptionaldefault = 0Which axis to standardize along, within (1) or across (0) records
Zscore series data by subtracting the mean and dividing by the standard deviation either within or across records
-
axisintoptionaldefault = 0Which axis to zscore along, within (1) or across (0) records
Set all records that do not exceed the given threhsold to 0
-
thresholdscalarLevel below which to set records to zero
Correlate series data against one or many one-dimensional arrays.
-
signalarrayor strSignal(s) to correlate against, can be a numpy array or a MAT file containing the signal as a variable
Compute the value maximum of each record in a Series
Compute the value minimum of each record in a Series
Compute the value sum of each record in a Series
Compute the value mean of each record in a Series
Compute the value median of each record in a Series
Compute the value percentile of each record in a Series.
-
qscalarFloating point number between 0 and 100 inclusive, specifying percentile.
return self.series_stat('stdev')
series_stat(self, stat):
Compute a simple statistic for each record in a Series
-
statstrWhich statistic to compute
Compute many statistics for each record in a Series
Compute the mean across fixed sized panels of each record.
Splits each record into panels of size length,
and then computes the mean across panels.
Panel length must subdivide record exactly.
-
lengthintFixed length with which to subdivide.
Select or filter elements of the Series by index values (across levels, if multi-index).
The index is a property of a Series object that assigns a value to each position within the arrays stored in the records of the Series. This function returns a new Series where, within each record, only the elements indexed by a given value(s) are retained. An index where each value is a list of a fixed length is referred to as a 'multi-index', as it provides multiple labels for each index location. Each of the dimensions in these sublists is a 'level' of the multi-index. If the index of the Series is a multi-index, then the selection can proceed by first selecting one or more levels, and then selecting one or more values at each level.
-
vallist of listsSpecifies the selected index values. List must contain one list for each level of the multi-index used in the selection. For any singleton lists, the list may be replaced with just the integer.
-
levellist of intsoptionaldefault=0Specifies which levels in the multi-index to use when performing selection. If a single level is selected, the list can be replaced with an integer. Must be the same length as val.
-
squeezebooloptionaldefault=FalseIf True, the multi-index of the resulting Series will drop any levels that contain only a single value because of the selection. Useful if indices are used as unique identifiers.
-
filterbooloptionaldefault=FalseIf True, selection process is reversed and all index values EXCEPT those specified are selected.
-
return_maskbooloptionaldefault=FalseIf True, return the mask used to implement the selection.
Aggregrate data in each record, grouping by index values.
For each unique value of the index, applies a function to the group indexed by that value. Returns a Series indexed by those unique values. For the result to be a valid Series object, the aggregating function should return a simple numeric type. Also allows selection of levels within a multi-index. See select_by_index for more info on indices and multi-indices.
-
functionfunctionAggregating function to map to Series values. Should take a list or ndarray as input and return a simple numeric value.
-
levellist of intsoptionaldefault=0Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.
Compute the desired statistic for each uniue index values (across levels, if multi-index)
-
statstringStatistic to be computed: sum, mean, median, stdev, max, min, count
-
levellist of intsoptionaldefault=0Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.
Compute sums for each unique index value (across levels, if multi-index)
Compute means for each unique index value (across levels, if multi-index)
Compute medians for each unique index value (across levels, if multi-index)
Compute means for each unique index value (across levels, if multi-index)
Compute maximum values for each unique index value (across levels, if multi-index)
Compute minimum values for each unique index value (across level, if multi-index)
Count the number for each unique index value (across levels, if multi-index)
Compute covariance of a distributed matrix.
-
axisintoptionaldefault = NoneAxis for performing mean subtraction, None (no subtraction), 0 (rows) or 1 (columns)
Compute gramian of a distributed matrix.
The gramian is defined as the product of the matrix with its transpose, i.e. A^T * A.
Multiply a matrix by another one.
Other matrix must be a numpy array, a scalar, or another matrix in local mode.
-
otherMatrixscalaror numpy arrayA matrix to multiply with
Convert Series to TimeSeries, a subclass for time series computation.
Converts Series to Images.
Equivalent to calling series.toBlocks(size).toImages()
-
sizestroptionaldefault = "150M"String interpreted as memory size.
Write data to binary files.
-
pathstring path or URI to directory to be createdOutput files will be written underneath path. Directory will be created as a result of this call.
-
prefixstroptionaldefault = 'series'String prefix for files.
-
overwriteboolIf true, path and all its contents will be deleted and recreated as partof this call.
Load Images object from a Spark RDD.
Must be a collection of key-value pairs where keys are singleton tuples indexing images, and values are 2d or 3d ndarrays.
-
rddSparkRDDAn RDD containing images
-
dimstuple or arrayoptionaldefault = NoneImage dimensions (if provided will avoid check).
-
nrecordsintoptionaldefault = NoneNumber of images (if provided will avoid check).
-
dtypestringdefault = NoneData numerical type (if provided will avoid check)
Load Series object from a local array-like.
First dimension will be used to index images, so remaining dimensions after the first should be the dimensions of the images/volumes, e.g. (3, 100, 200) for 3 x (100, 200) images
-
valuesarray-likeThe array of images
-
npartitionsintdefault = NoneNumber of partitions for parallelization (Spark only)
-
engineobjectdefault = NoneComputational engine (e.g. a SparkContext for Spark)
Load images from a list of items using the given accessor.
-
accessorfunctionApply to each item from the list to yield an image
-
keyslistoptionaldefault=NoneAn optional list of keys
-
dimstupleoptionaldefault=NoneSpecify a known image dimension to avoid computation.
-
npartitionsintNumber of partitions for computational engine
frompath(path, accessor=None, ext=None, start=None, stop=None, recursive=False, npartitions=None, dims=None, dtype=None, recount=False, engine=None, credentials=None)
Load images from a path using the given accessor.
Supports both local and remote filesystems.
-
accessorfunctionApply to each item after loading to yield an image.
-
extstroptionaldefault=NoneFile extension.
-
npartitionsintoptionaldefault=NoneNumber of partitions for computational engine, if None will use default for engine.
-
dimstupleoptionaldefault=NoneDimensions of images.
-
dtypestroptionaldefault=NoneNumerical type of images.
start, stop: nonnegative int, optional, default=None Indices of files to load, interpreted using Python slicing conventions.
-
recursivebooleanoptionaldefault=FalseIf true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
recountbooleanoptionaldefault=FalseForce subsequent record counting.
frombinary(path, shape=None, dtype=None, ext='bin', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, conf='conf.json', order='C', engine=None, credentials=None)
Load images from flat binary files.
Assumes one image per file, each with the shape and ordering as given by the input arguments.
-
pathstrPath to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
shapetuple of positive intDimensions of input image data.
-
extstringoptionaldefault="bin"Extension required on data files to be loaded.
-
start, stopnonnegative intoptionaldefault=NoneIndices of the first and last-plus-one file to load, relative to the sorted filenames matching
pathandext. Interpreted using python slice indexing conventions. -
recursivebooleanoptionaldefault=FalseIf true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
nplanespositive integeroptionaldefault=NoneIf passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
-
npartitionsintoptionaldefault=NoneNumber of partitions for computational engine, if None will use default for engine.
fromtif(path, ext='tif', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, engine=None, credentials=None)
Loads images from single or multi-page TIF files.
-
pathstrPath to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
extstringoptionaldefault="tif"Extension required on data files to be loaded.
-
start, stopnonnegative intoptionaldefault=NoneIndices of the first and last-plus-one file to load, relative to the sorted filenames matching 'path' and 'ext'. Interpreted using python slice indexing conventions.
-
recursivebooleanoptionaldefault=FalseIf true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
nplanespositive integeroptionaldefault=NoneIf passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
-
npartitionsintoptionaldefault=NoneNumber of partitions for computational engine, if None will use default for engine.
frompng(path, ext='png', start=None, stop=None, recursive=False, npartitions=None, engine=None, credentials=None)
Load images from PNG files.
-
pathstrPath to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
extstringoptionaldefault="tif"Extension required on data files to be loaded.
-
start, stopnonnegative intoptionaldefault=NoneIndices of the first and last-plus-one file to load, relative to the sorted filenames matching
pathandext. Interpreted using python slice indexing conventions. -
recursivebooleanoptionaldefault=FalseIf true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
npartitionsintoptionaldefault=NoneNumber of partitions for computational engine, if None will use default for engine.
Generate random image data.
-
shapetupleoptionaldefault=(105050)Dimensions of images.
-
npartitionsintoptionaldefault=1Number of partitions.
-
seedintoptionaldefault=42Random seed.
Load example image data.
Data must be downloaded from S3, so this method requires an internet connection.
-
namestrName of dataset, if not specified will print options.
Load Series object from a Spark RDD.
Assumes keys are tuples with increasing and unique indices, and values are 1d ndarrays. Will try to infer properties that are not explicitly provided.
-
rddSparkRDDAn RDD containing series data.
-
shapetuple or arrayoptionaldefault = NoneTotal shape of data (if provided will avoid check).
-
nrecordsintoptionaldefault = NoneNumber of records (if provided will avoid check).
-
indexarrayoptionaldefault = NoneIndex for records, if not provided will use (0, 1, ...)
-
dtypestringdefault = NoneData numerical type (if provided will avoid check)
Load Series object from a local numpy array.
Assumes that all but final dimension index the records, and the size of the final dimension is the length of each record, e.g. a (2, 3, 4) array will be treated as 2 x 3 records of size (4,)
-
valuesarray-likeAn array containing the data.
-
indexarrayoptionaldefault = NoneIndex for records, if not provided will use (0,1,...,N) where N is the length of each record.
-
npartitionsintdefault = NoneNumber of partitions for parallelization (Spark only)
-
engineobjectdefault = NoneComputational engine (e.g. a SparkContext for Spark)
Create a Series object from a list of items and optional accessor function.
Will call accessor function on each item from the list, providing a generic interface for data loading.
-
itemslistA list of items to load.
-
accessorfunctionoptionaldefault = NoneA function to apply to each item in the list during loading.
-
indexarrayoptionaldefault = NoneIndex for records, if not provided will use (0,1,...,N) where N is the length of each record.
-
dtypestringdefault = NoneData numerical type (if provided will avoid check) -
npartitionsintdefault = NoneNumber of partitions for parallelization (Spark only)
-
engineobjectdefault = NoneComputational engine (e.g. a SparkContext for Spark)
fromtext(path, ext='txt', dtype='float64', skip=0, shape=None, index=None, npartitions=None, engine=None, credentials=None)
Loads Series data from text files.
Assumes data are formatted as rows, where each record is a row of numbers separated by spaces e.g. 'v v v v v'. You can optionally specify a fixed number of initial items per row to skip / discard.
-
pathstringDirectory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
-
extstroptionaldefault = 'txt'File extension.
dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.
-
skipintoptionaldefault = 0Number of items in each record to skip.
-
shapetuple or listoptionaldefault = NoneShape of data if known, will be inferred otherwise.
-
indexarrayoptionaldefault = NoneIndex for records, if not provided will use (0, 1, ...)
-
npartitionsintdefault = NoneNumber of partitions for parallelization (Spark only)
-
engineobjectdefault = NoneComputational engine (e.g. a SparkContext for Spark)
-
credentialsdictdefault = NoneCredentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}
frombinary(path, ext='bin', conf='conf.json', dtype=None, shape=None, skip=0, index=None, engine=None, credentials=None)
Load a Series object from flat binary files.
-
pathstring URI or local filesystem pathDirectory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
-
extstroptionaldefault = 'bin'Optional file extension specifier.
-
confstroptionaldefault = 'conf.json'Name of conf file with type and size information.
dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.
-
shapetuple or listoptionaldefault = NoneShape of data if known, will be inferred otherwise.
-
skipintoptionaldefault = 0Number of items in each record to skip.
-
indexarrayoptionaldefault = NoneIndex for records, if not provided will use (0, 1, ...)
-
engineobjectdefault = NoneComputational engine (e.g. a SparkContext for Spark)
-
credentialsdictdefault = NoneCredentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}
Generate gaussian random series data.
-
shapetupleDimensions of data.
-
npartitionsintNumber of partitions with which to distribute data.
-
seedintRandomization seed.
Load example series data.
Data must be downloaded from S3, so this method requires an internet connection.
-
namestrName of dataset, options include 'iris' | 'mouse' | 'fish'. If not specified will print options.