src.dfencoder package
Submodules
src.dfencoder.autoencoder module
- src.dfencoder.autoencoder.ohe(input_vector, dim, device='cpu')[source]
Does one-hot encoding of input vector.
- src.dfencoder.autoencoder.compute_embedding_size(n_categories)[source]
Applies a standard formula to choose the number of feature embeddings to use in a given embedding layers.
n_categories is the number of unique categories in a column.
- class src.dfencoder.autoencoder.CompleteLayer(in_dim, out_dim, activation=None, dropout=None, *args, **kwargs)[source]
Bases:
torch.nn.modules.module.ModuleImpliments a layer with linear transformation and optional activation and dropout.
- forward(x)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
- class src.dfencoder.autoencoder.AutoEncoder(encoder_layers=None, decoder_layers=None, encoder_dropout=None, decoder_dropout=None, encoder_activations=None, decoder_activations=None, activation='relu', min_cats=10, swap_p=0.15, lr=0.01, batch_size=256, eval_batch_size=1024, optimizer='adam', amsgrad=False, momentum=0, betas=(0.9, 0.999), dampening=0, weight_decay=0, lr_decay=None, nesterov=False, verbose=False, device=None, logger='basic', logdir='logdir/', project_embeddings=True, run=None, progress_bar=True, n_megabatches=1, scaler='standard', eps=1e-06, *args, **kwargs)[source]
Bases:
torch.nn.modules.module.Module- build_model(df)[source]
Takes a pandas dataframe as input. Builds autoencoder model.
Returns the dataframe after making changes.
- compute_baseline_performance(in_, out_)[source]
- Baseline performance is computed by generating a strong
prediction for the identity function (predicting input==output) with a swapped (noisy) input, and computing the loss against the unaltered original data.
- This should be roughly the loss we expect when the encoder degenerates
into the identity function solution.
- Returns net loss on baseline performance computation
(sum of all losses)
- train_megabatch_epoch(n_updates, df)[source]
Run epoch doing ‘megabatch’ updates, preprocessing data in large chunks.
- get_representation(df, layer=0)[source]
- Computes latent feature vector from hidden layer
given input dataframe.
argument layer (int) specifies which layer to get. by default (layer=0), returns the “encoding” layer.
layer < 0 counts layers back from encoding layer. layer > 0 counts layers forward from encoding layer.
- get_deep_stack_features(df)[source]
records and outputs all internal representations of input df as row-wise vectors. Output is 2-d array with len() == len(df)
- get_anomaly_score(df)[source]
Returns a per-row loss of the input dataframe. Does not corrupt inputs.
- decode_to_df(x, df=None)[source]
Runs input embeddings through decoder and converts outputs into a dataframe.
- df_predict(df)[source]
Runs end-to-end model. Interprets output and creates a dataframe. Outputs dataframe with same shape as input
containing model predictions.
- training: bool
src.dfencoder.dataframe module
src.dfencoder.scalers module
- class src.dfencoder.scalers.StandardScaler[source]
Bases:
objectImpliments standard (mean/std) scaling.