Abstract

→

Our Method

Ours

In this work, we propose a method that synergistically integrates multi-plane representation with a coordinate-based network known for strong bias toward low-frequency signals. The coordinate-based network is responsible for capturing low-frequency details, while the multi-plane representation focuses on capturing fine-grained details. We demonstrate that using residual connections between them seamlessly preserves their own inherent properties. Additionally, the proposed progressive training scheme accelerates the disentanglement of these two features. We empirically show that the proposed method achieves comparable results to explicit encoding with fewer parameters, and particularly, it outperforms others for the static and dynamic NeRFs under sparse inputs.

TL;DR:

We incorporate multi-plane representation and coordinate networks to improve NeRFs from sparse-inputs. This technique consistently proves effective in both static and dynamic NeRF applications, outperforming existing methods.

Video

TBD

Residual Neural Radiance Fields Spanning Diverse Spectrum


The residual connection enhances the network's efficiency in responding to input values, emphasizing the importance of coordinate networks. We utilize ReLU activation to promote a low-frequency spectral bias. Our proposed method handles both low and high-frequency information in two distinct contexts. When only the coordinate network is employed, the output is biased towards low frequencies, aiding global reasoning. However, engaging all features yields clear and detailed images.

\[ \begin{aligned} \phi_1(s_k, f_k) &= \textit{h}\big(W_1^2 \cdot \textit{h}(W_1^1 \cdot (s_k \oplus f_k) + b_1^1) + b_1^2\big) \\ \phi_2(s_k, f_k, \phi_1) &= \textit{h}(W_2^2 \cdot \textit{h}(W_2^1 \cdot (s_k \oplus f_k \oplus \phi_1(s_k, f_k)) + b_2^1) + b_2^2 \end{aligned} \]

The parameters \( \{W_l, b_l\}_{l=1}^L \) are the weights and biases of the \( l \)-layer MLP. The subsequent process is defined in MLP \( \phi_l(\cdot) \), where \( l>2 \), contains one pair of weights and biases. The residual concatenation of coordinates value \( s_k \) and multi-plane features \( f_k \) across the first two blocks. We employ ReLU activation \( \textit{h} \) to lean toward low-frequency spectral bias. \( \oplus \) denotes the concatenation operation.

Scheme


Curriculum Weighting Strategy

It encounters challenges in severe ill-conditioned situations, such as heavy occlusion and rapid motion, as seen in the drums in the static NeRF and the standup in the dynamic NeRF. To alleviate this issue, we propose a curriculum weighting strategy for multi-plane encoding, aiming to manipulate the engagement of multi-plane features per training step. This approach trains the coordinate-based network first, followed by the subsequent training of multi-plane features. In this subsection, we denote \( t \) as the training iteration. Technically, we introduce a weighting factor denoted as \( \alpha(t) \) to control the degree of engagement of multi-plane features along multi-plane channel dimensions.

\[ \begin{equation} \gamma_{j}(t) = \begin{cases} 0 & \text{if } \alpha(t) \leq j \\ \frac{1-\cos ((\alpha(t) - j) \pi)}{2} & \text{if } 0 < \alpha(t) - j \leq 1 \\ 1 & \text{otherwise, } \end{cases} \end{equation} \]

Experimental Results: Static NeRF

Our proposed multi-plane encoding technique can exclusively capture fine-grained details while maintaining global shape learned by coordinate features, leading to more robust novel view synthesis in sparse-input scenarios. We trained all models with 8 views. Due to space limitations, we present the results using low-resolution movies. When viewed at full resolution, ours clearly demonstrate superior performance.

FreeNeRF (CVPR2023)

TensoRF (ECCV2022)

K-Planes (CVPR2023)

Ours

Experimental Results: Dynamic NeRF

Performance improvement is even more evident in the dynamic NeRFs. We trained all models with 25 views.

HexPlane (CVPR2023)

Ours

More Results


We have provided additional results featuring high-resolution movies. This video apparently demonstrates the superiority of our method. For static Neural Radiance Fields (NeRF), please visit the provided link. We also offer results for dynamic NeRF at this link.

Bibliography

Acknowledgements

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [No.2022-0-00641, XVoice: Multi-Modal Voice Meta Learning]. A portion of this work was carried out during an internship at NAVER AI Lab. We also extend our gratitude to ACTNOVA for providing the computational resources required.