vis4d.model.motion.velo_lstm¶
VeloLSTM 3D motion model.
Functions
|
Initialize LSTM weights and biases. |
Classes
|
Estimating object location in world coordinates. |
|
VeloLSTM output. |
- class VeloLSTM(num_frames=5, feature_dim=64, hidden_size=128, num_layers=2, loc_dim=7, dropout=0.1, weights=None)[source]¶
Estimating object location in world coordinates.
- Prediction LSTM:
Input: 5 frames velocity Output: Next frame location
- Updating LSTM:
Input: predicted location and observed location Output: Refined location
Init.
Initializae hidden state.
The axes semantics are (num_layers, minibatch_size, hidden_dim)
- Return type:
tuple
[Tensor
,Tensor
]
- predict(vel_history, location, hc_0)[source]¶
Predict location at t+1 using updated location at t.
- Return type:
tuple
[Tensor
,tuple
[Tensor
,Tensor
]]
- Input:
vel_history: (num_seq, num_batch, loc_dim), velocity from previous num_seq updates location: (num_batch, loc_dim), location from previous update hc_0: (num_layers, num_batch, hidden_size), tuple of hidden and cell
- Middle:
embed: (num_seq, num_batch x feature_dim), location feature out: (num_seq x num_batch x hidden_size), lstm output attention_logit: (num_seq x num_batch x loc_dim), the predicted residual
- Output:
hc_n: (num_layers, num_batch, hidden_size), tuple of updated hidden, cell output_pred: (num_batch x loc_dim), predicted location
- refine(location, observation, prev_location, confidence, hc_0)[source]¶
Refine predicted location using single frame estimation at t+1.
- Return type:
tuple
[Tensor
,tuple
[Tensor
,Tensor
]]
- Input:
location: (num_batch x loc_dim), location from prediction observation: (num_batch x loc_dim), location from single frame estimation prev_location: (num_batch x loc_dim), refined location confidence: (num_batch X 1), depth estimation confidence hc_0: (num_layers, num_batch, hidden_size), tuple of hidden and cell
- Middle:
loc_embed: (1, num_batch x feature_dim), predicted location feature obs_embed: (1, num_batch x feature_dim), single frame location feature conf_embed: (1, num_batch x feature_dim), depth estimation confidence feature embed: (1, num_batch x 2*feature_dim), location feature out: (1 x num_batch x hidden_size), lstm output
- Output:
hc_n: (num_layers, num_batch, hidden_size), tuple of updated hidden, cell output_pred: (num_batch x loc_dim), predicted location