读博是一个漫长的过程,对于直博的同学需要度过5-6年,对于硕士毕业接着读博的同学也是4-5年的日子。这段时间往往占据了一半的青春时光(20-30岁),这可是人生最美好的年华。所以,在思考这个问题和做出自己的决定前,请慎之又慎。在决定这个问题的答案时,请遵从自己内心的声音,思考以下问题:
选择适合自己性格的道路:我们需要意识到这个世界上每个人都有不一样的性格,不同性格也适合干不一样的工作或事业。所以,不是所有人都适合科研的道路,而且科研也不会比其他行业更加高尚,选择适合自己性格的工作或事业才是让人生过得开心、快乐和充实的根本。
不骄不躁的心境和严谨认真的态度:我认为这是读博所需要的两个最重要的性格。长达5年的时间里,你将经历:自己的论文屡次被拒,自己的想法被别人指指点点,自己的想法已经被同行做出来了,同学屡次中了好论文和奖学金,同学拿到了你梦寐以求的大厂实习,同学找到了150W年薪的工作……
VSCode相关
Settings->Pages
中进行部署,注意github账号名xxx
可能需要对应xxx.github.io
。可以在代码仓库中单独创建docs/
来创建,例如这个页面对应的仓库地址。Linux系统相关
Latex相关
cite
,其中有BibTex
,然后粘贴到.bib
文件中即可。
ctrl c
和ctrl v
并不会真正导入PPT,换一台电脑就找不到视频,无法播放。
博士生应该具有的能力、如何锻炼这些能力、如何做科研项目:[参考1],[参考2]。
input->A-> intermediate output->B->output
这种完全A
+ B的组合(完全拼接式的组合)。组合需要是具有创新性的组合。正常情况下直接拼接两个方法也无法解决问题,否则这个问题就没啥技术挑战。论文写作模板:[参考]。
\section{Abstract}
% Task
% Technical challenge for previous methods (围绕我们解决了的technical challenge展开讨论)
% 一两句话介绍解决challenge的technical contribution (一般就提到xxx技术的名词,不会讲具体的每个步骤。这个名词要让人读得懂,不要有jump的感觉。这个能力对写好abstract很重要。)
% 介绍technical contribution的好处
% Experiment
\section{Abstract}
% Task
%% 例子1: In recent years, generative models have undergone significant advancement due to the success of diffusion models.
%% 例子2: This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
% Technical challenge for previous methods (围绕我们解决了的technical challenge展开讨论)
%% 例子1: The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to tradeoff between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness.
%% 例子2: Some recent works have shown that learning implicit neural representations of 3D scenes achieves remarkable view synthesis quality given dense input views. However, the representation learning will be ill-posed if the views are highly sparse.
% 一句话介绍解决challenge的insight
%% 例子1: To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models.
%% 例子2: To solve this ill-posed problem, our key idea is to integrate observations over video frames.
% 一两句话介绍实现insight的technical contribution (一般就提到xxx技术的名词,不会讲具体的每个步骤。这个名词要让人读得懂,不要有jump的感觉。这个能力对写好abstract很重要。)
%% 例子1: To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior.
%% 例子2: To this end, we propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh
% 介绍technical novelty的好处
%% 例子2: so that the observations across frames can be naturally integrated. The deformable mesh also provides geometric guidance for the network to learn 3D representations more efficiently.
% Experiment
% Task
%% This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation.
%% Unlike some recent methods that directly regress the coordinates of the object boundary points from an image
% 一句话介绍technical contribution和technical advantage (这个能力对写好abstract很重要。)
%% deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach.
% 一句话介绍technical contribution和technical advantage
%% For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution.
% 一句话介绍technical contribution和technical advantage
%% Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization.
% Experiment
\section{Introduction}
% Task and application
% Technical challenge for previous methods (围绕我们解决了的technical challenge展开讨论。Technical challenge包括limitation和technical reason)
% 介绍解决challenge的our pipeline
% Experiment
% Contributions
% 介绍Task (如果task很熟悉,可以直接跳过)
%% 例子:Object pose estimation aims to estimate object's orientation and translation relative to a canonical frame from a single image.
[xxx task] targets at recovering/reconstructing/estimating [xxx 输出] from [xxx 输入].
% 介绍Application
%% 例子:Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
% 介绍Application
%% 例子:Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
% 介绍general task的application
%% 例子:Accurate pose estimation is essential for a variety of applications such as augmented reality, autonomous driving and robotic manipulation.
[xxx task] has a variety of applications such as [xxx], [xxx], and [xxx].
% 介绍具体的task setting
%% 例子:This paper focuses on the specific setting of recovering the 6DoF pose of an object, i.e., rotation and translation in 3D, from a single RGB image of that object.
This paper focuses on the specific setting of recovering/reconstructing/estimating [xxx 输出] from [xxx 输入].
% ManhattanSDF的introduction的开头一段
% Deep Snake的introduction的开头一段
% 介绍Application
%% 例子1: Reconstructing 3D scenes from multi-view images is a cornerstone of many applications such as augmented reality, robotics, and autonomous driving.
%% 例子2: Instance segmentation is the cornerstone of many computer vision tasks, such as video analysis, autonomous driving, and robotic grasping, which require both accuracy and efficiency.
% 通过介绍previous methods来引出想解决的technical challenge
%% 例子1: Given input images, traditional methods [43, 44, 59] generally estimate the depth map for each image based on the multi-view stereo (MVS) algorithms and then fuse estimated depth maps into 3D models. Although these methods achieve successful reconstruction in most cases, they have difficulty in handling low-textured regions, e.g., floors and walls of indoor scenes, due to the unreliable stereo matching in these regions.
%% 例子2: Most of the state-of-the-art instance segmentation methods [18, 27, 5, 19] perform pixel-wise segmentation within a bounding box given by an object detector [36], which may be sensitive to the inaccurate bounding box. Moreover, representing an object shape as dense binary pixels generally results in costly post-processing.
% 讨论这个task的general technical challenges (用于引出recent methods)
%% 例子1:This problem is quite challenging from many perspectives, including object detection under severe occlusions, variations in lighting and appearance, and cluttered background objects.
%% 例子2:This problem is particularly challenging due to the inherent ambiguity on acquiring human geometry, materials and motions from images.
This problem is particularly challenging due to several factors, including [xxx 原因], [xxx 原因], and [xxx 原因].
% 一两句话简单介绍一类traditional methods, 然后讨论他们面临的technical challenge (如果存在traditional method,需要讨论一下,显示我们很懂这个领域)
%% 介绍traditional method
%% 例子: Traditional methods have shown that pose estimation can be achieved by establishing the correspondences between an object image and the object model.
To overcome these challenges, traditional methods [描述怎么做的], [达到了怎样的效果].
%% 讨论他们面临的technical challenge
%% 例子: They rely on hand-crafted features, which are not robust to image variations and background clutters.
However, they [面临的technical challenge].
% 一两句话简单介绍一类recent methods 1,然后讨论他们面临的technical challenge (optional, 通过讨论做法来引出technical challenge。如果有需要,可以多讨论几个recent method,要对引出technical challange有帮助。)
%% 介绍recent methods 1
%% 例子: Deep learning based methods train end-to-end neural networks that take an image as input and output its corresponding pose.
Recently, [xxx methods] [描述怎么做的], [达到了怎样的效果].
%% 讨论他们面临的technical challenge (介绍limitation和technical reason)
%% 例子: However, generalization remains as an issue, as it is unclear that such end-to-end methods learn sufficient feature representations for pose estimation.
However, they [存在的limitation], because [xxx technical reason].
% 一两句话简单讨论一类recent methods 2,然后讨论他们面临的technical challenge (需要引出我们解决了的technical challange)
%% 介绍recent methods 2
%% 例子: Some recent methods use CNNs to first regress 2D keypoints and then compute 6D pose parameters using the Perspective-n-Point (PnP) algorithm. In other words, the detected keypoints serve as an intermediate representation for pose estimation. Such two-stage approaches achieve state-of-the-art performance, thanks to robust detection of keypoints.
To overcome this challenge, [xxx methods] [描述怎么做的], [达到了怎样的效果].
%% 讨论他们面临的technical challenge (介绍limitation和technical reason)
%% 例子: However, these methods have difficulty in tackling occluded and truncated objects, since part of their keypoints are invisible. Although CNNs may predict these unseen keypoints by memorizing similar patterns, generalization remains difficult.
However, they [存在的limitation], because [xxx technical reason].
% 介绍一类traditional/recent methods怎么做的,讨论他们面临的technical challenge (为了引出我们的insight)
%% 介绍一类traditional/recent methods怎么做的
%% 例子1, deep snake: Most of the state-of-the-art instance segmentation methods perform pixel-wise segmentation within a bounding box given by an object detector.
%% 例子2, ManhattanSDF: Given input images, traditional methods generally estimate the depth map for each image based on the multi-view stereo (MVS) algorithms and then fuse estimated depth maps into 3D models.
Traditional/recent methods [描述怎么做的], [达到了怎样的效果].
%% 讨论他们面临的technical challenge (介绍limitation和technical reason)
%% 例子1, deep snake: They may be sensitive to the inaccurate bounding box. Moreover, representing an object shape as dense binary pixels generally results in costly post-processing.
%% 例子2, ManhattanSDF: Although these methods achieve successful reconstruction in most cases, they have difficulty in handling low-textured regions, e.g., floors and walls of indoor scenes, due to the unreliable stereo matching in these regions.
However, they [存在的limitation], because [xxx technical reason].
% 讨论使用了我们的insight的traditional methods (讨论解决相同task的技术类似的traditional method,暗示我们提出的技术有传统方法背书)
%% 例子1,deep snake: An alternative shape representation is the object contour, which is a set of vertices along the object silhouette. In contrast to pixel-based representation, a contour is not limited within a bounding box and has fewer parameters. Such a contour-based representation has long been used in image segmentation since the seminal work by Kass et al., which is well known as snakes or active contours.
%% 例子2,ManhattanSDF: To improve the reconstruction of low-textured regions, a typical approach is leveraging the planar prior of manmade scenes, which has long been explored in literature. A renowned example is the Manhattanworld assumption, i.e., the surfaces of man-made scenes should be aligned with three dominant directions.
%% 介绍insight
To overcome this problem, a typical approach is [xxx insight], which has long been explored in literature.
%% 介绍一类传统方法怎么做的
These methods [描述怎么做的].
%% 讨论他们面临的technical challenge (介绍limitation和technical reason)
%% 例子1, deep snake: While many variants have been developed in literature, these methods are prone to local optima as the objective functions are handcrafted and typically nonconvex.
%% 例子2,ManhattanSDF: However, all of them focus on optimizing per-view depth maps instead of the full scene models in 3D space. As a result, depth estimation and plane segmentation could still be inconsistent among views, yielding suboptimal reconstruction quality as demonstrated by our experimental results in Section 5.3.
However, they [存在的limitation], because [xxx technical reason].
% 一两句话简单讨论一类recent method 2,然后讨论他们面临的technical challenge (需要引出我们解决了的technical challange)
%% 介绍recent method 2
%% 例子: There is a recent trend to represent 3D scenes as implicit neural representations and learn the representations from images with differentiable renderers. In particular, [49, 54, 55] use a signed distance field (SDF) to represent the scene and render it into images based on the sphere tracing or volume rendering. Thanks to the welldefined surfaces of SDFs, they recover high-quality 3D geometries from images.
To overcome this challenge, [xxx methods] [描述怎么做的], [达到了怎样的效果].
%% 讨论他们面临的technical challenge (介绍limitation和technical reason)
%% 例子: However, these methods essentially rely on the multi-view photometric consistency to learn the SDFs. So they still suffer from poor performance in lowtextured planar regions, as shown in Figure 1, as many plausible solutions may satisfy the photometric constraint in low-textured planar regions.
However, they [存在的limitation], because [xxx technical reason].
% 为了实现xx目标,需要满足几个requirements (或者面临几个challenges)。
%% 例子:In this work, our goal is to build a model that captures such object intrinsics from a single image. This problem is challenging for three reasons.
% 描述第一点
%% 例子:First, we only have a single image. This makes our work fundamentally different from existing works on 3D-aware image generation models [8, 9, 27, 28], which typically require a large dataset of thousands of instances for training. In comparison, the single image contains at most a few dozen instances, making the inference problem highly under-constrained.
% 描述第二点
%% 例子:Second, these already limited instances may vary significantly in pixel values. This is because they have different poses and illumination conditions, but neither of these factors are annotated or known. We also cannot resort to existing tools for pose estimation based on structure from motion, such as COLMAP [35], because the appearance variations violate the assumptions of epipolar geometry.
% 描述第三点
%% 例子:Finally, the object intrinsics we aim to infer are probabilistic, not deterministic: no two roses in the natural world are identical, and we want to capture a distribution of their geometry, texture, and material to exploit the underlying multi-view information.
% In this paper, we propose a novel framework …
%% 例子: In this paper, we introduce a novel implicit neural representation for dynamic humans, named Neural Body, to solve the challenge of novel view synthesis from sparse views.
In this paper, we propose a novel framework/representation, named [方法名字] for [xxx task].
% 画一个teaser介绍basic idea
%% 例子: The basic idea is illustrated in Figure 2.
The basic idea is illustrated in [xxx Figure].
% 一句话介绍我们的key novelty/contribution (这个能力对写好introduction很重要,得能在一两句话里介绍清楚我们的key idea,让读者能读懂我们在说什么。)
%% 例子: For the implicit fields at different frames, instead of learning them separately, Neural Body generates them from the same set of latent codes.
Our innovation is in [一句话介绍我们的key novelty].
% 讲具体怎么做的
%% 例子: Specifically, we anchor a set of latent codes to the vertices of a deformable human model (SMPL in this work), namely that their spatial locations vary with the human pose. To obtain the 3D representation at a frame, we first transform the code locations based on the human pose, which can be reliably estimated from sparse camera views. Then, a network is designed to regress the density and color for any 3D point based on these latent codes. Both the latent codes and the network are jointly learned from images of all video frames during the reconstruction
Specifically, [讲具体怎么做的].
% 介绍我们方法的advantage (本质上能work的原因是什么, 我们方法相对于之前方法的好处是什么。)
%% 例子: This model is inspired by the latent variable model in statistics, which enables us to effectively integrate observations at different frames.
In contrast to previous methods, [我们方法的advantage].
% 介绍另一个advantage
%% 例子: Another advantage of the proposed method is that the deformable model provides a geometric prior (rough surface location) to enable more efficient learning of implicit fields.
Another advantage of the proposed method is that [我们的另一个advantage].
% In this paper, we propose a novel framework …
%% 例子: In this paper, we introduce a novel implicit neural representation for dynamic humans, named Neural Body, to solve the challenge of novel view synthesis from sparse views.
In this paper, we propose a novel framework/representation, named [方法名字] for [xxx task].
% 一句话介绍我们的key novelty/contribution
%% 例子: To that end, we propose techniques to represent a given subject with rare token identifiers and fine-tune a pre-trained, diffusionbased text-to-image framework that operates in two steps; generating a low-resolution image from text and subsequently applying super-resolution (SR) diffusion models.
Our innovation is in [一句话介绍我们的key novelty].
% 画一个teaser介绍basic idea
% 例子: The basic idea is illustrated in Figure 2.
The basic idea is illustrated in [xxx Figure].
% 讲具体怎么做的
%% 例子: We first fine-tune the low-resolution text-toimage model with the input images and text prompts containing a unique identifier followed by the class name of the subject (e.g., “A [V] dog”).
Specifically, [讲具体怎么做的].
% 介绍我们方法的advantage (本质上能work的原因是什么, 我们方法相对于之前方法的好处是什么。)
%% 例子: This model is inspired by the latent variable model in statistics, which enables us to effectively integrate observations at different frames.
In contrast to previous methods, [我们方法的advantage].
% 介绍另一个technical contribution (一般是为了解决contribution 1面临的technical challenge,不然两个contributions有点松散)
%% 讨论另一个technical challenge
%% 例子: In order to prevent overfitting and language drift [35, 40] that cause the model to associate the class name (e.g., “dog”) with the specific instance
However, [描述另一个technical challenge].
%% 描述technical contribution 2具体怎么做的
%% 例子: we propose an autogenous, class-specific prior preservation loss, which leverages the semantic prior on the class that is embedded in the model, and encourages it to generate diverse instances of the same class as our subject.
Specifically, [讲具体怎么做的].
% In this paper, we propose a learning-based snake algorithm, named deep snake, for real-time instance segmentation.
% Inspired by previous methods [21, 25], deep snake takes an initial contour as input and deforms it by regressing vertex-wise offsets.
% Our innovation is introducing the circular convolution for efficient feature learning on a contour, as illustrated in Figure 1.
% We observe that the contour is a cycle graph that consists of a sequence of vertices connected in a closed cycle. Since every vertex has the same degree equal to two, we can apply the standard 1D convolution on the vertex features.
% Considering that the contour is periodic, deep snake introduces the circular convolution, which indicates that an aperiodic function (1D kernel) is convolved in the standard way with a periodic function (features defined on the contour).
% The kernel of circular convolution encodes not only the feature of each vertex but also the relationship among neighboring vertices. In contrast, the generic GCN performs pooling to aggregate information from neighboring vertices. The kernel function in our circular convolution amounts to a learnable aggregation function, which is more expressive and results in better performance than using a generic GCN, as demonstrated by our experimental results in Section 5.2.
% In this paper, we propose a learning-based snake algorithm, named deep snake, for real-time instance segmentation.
% Our innovation is introducing the circular convolution for efficient feature learning on a contour, as illustrated in Figure 1.
% We observe that the contour is a cycle graph that consists of a sequence of vertices connected in a closed cycle. Since every vertex has the same degree equal to two, we can apply the standard 1D convolution on the vertex features.
% Considering that the contour is periodic, deep snake introduces the circular convolution, which indicates that an aperiodic function (1D kernel) is convolved in the standard way with a periodic function (features defined on the contour).
% The kernel of circular convolution encodes not only the feature of each vertex but also the relationship among neighboring vertices. In contrast, the generic GCN performs pooling to aggregate information from neighboring vertices. The kernel function in our circular convolution amounts to a learnable aggregation function, which is more expressive and results in better performance than using a generic GCN, as demonstrated by our experimental results in Section 5.2.
% Overview
% 一两句话介绍setting
%% 例子1: Given a sparse multi-view video of a performer, our task is to generate a free-viewpoint video of the performer.
%% 例子2: Given an image, the task of pose estimation is to detect objects and estimate their orientations and translations in the 3D space.
% 一两句话介绍论文的core contribution
%% 例子1: We build upon prior work for static scenes [46], to which we add the notion of time, and estimate 3D motion by explicitly modeling forward and backward scene flow as dense 3D vector fields.
%% 例子2: Inspired by [21, 25], we perform object segmentation by deforming an initial contour to match object boundary.
%% 例子3: Inspired by recent methods [29, 30, 36], we estimate the object pose using a two-stage pipeline: we first detect 2D object keypoints using CNNs and then compute 6D pose parameters using the PnP algorithm. Our innovation is in a new representation for 2D object keypoints as well as a modified PnP algorithm for pose estimation.
% 如果有论文pipeline/framework比较novel,画一张图介绍pipeline/framework
%% 例子: The overview of the proposed model is illustrated in Figure 3.
% Section 3.1描述了什么
%% 例子1: Neural Body starts from a set of structured latent codes attached to the surface of a deformable human model (Section 3.1).
%% 例子2: In this section, we first describe how to model 3D scenes with MLP maps (Section 3.1).
% Section 3.2描述了什么
%% 例子1: The latent code at any location around the surface can be obtained with a code diffusion process (Section 3.2) and then decoded to density and color values by neural networks (Section 3.3).
%% 例子2: Then, Section 3.2 discusses how to represent volumetric videos with dynamic MLP maps.
% Section 3.3描述了什么
%% 例子3: Finally, we introduce some strategies to speed up the rendering process (Section 3.3).
% 1. 先描述该技术的forward process或者module design(先总结我们要做什么,然后写我们怎么做的:给定输入,经过xx步骤,得到输出。也就是"Given xxx, we first xxx, then xxx, finally xxx")
% 1.1 我们要做什么
%% 例子: Given the input features defined on a contour, deep snake introduces the circular convolution for the feature learning, as illustrated in Figure 2.
% 我们怎么做的
% 1.2 we first do xx.
%% 例子: 首先构造circular convolution。
% 1.3 then, we do xx.
%% 例子: Similar to the standard convolution, we can construct a network layer based on the circular convolution for feature learning, which is easy to be integrated into a modern network architecture.
% 1.4 finally, we do xx.
%% 例子: After the feature learning, deep snake applies three 1×1 convolution layers to the output features for each vertex and predicts vertex-wise offsets between contour points and the target points, which are used to deform the contour.
% 2. 再描述该技术的technical advantage(motivation)
%% 例子: As discussed in the introduction, the proposed circular convolution better exploits the circular structure of the contour than the generic graph convolution. We will show the experimental comparison in Section 5.2. An alternative method is to use standard CNNs to regress a pixel-wise vector field from the input image to guide the evolution of the initial contour [37, 33, 40]. We argue that an important advantage of deep snake over the standard CNNs is the object-level structured prediction, i.e., the offset prediction at a vertex depends on other vertices of the same contour. Therefore, deep snake will predict a more reasonable offset for a vertex located far from the object. Standard CNNs may have difficulty in this case, as the regressed vector field may drive this vertex to another object which is closer.
% 1. 先写motivation(为什么要提出这个技术)
%% 例子: The implicit fields assign the density and color to each point in the 3D space, which requires us to query the latent codes at continuous 3D locations. This can be achieved with the trilinear interpolation. However, since the structured latent codes are relatively sparse in the 3D space, directly interpolating the latent codes leads to zero vectors at most 3D points. To solve this problem, we diffuse the latent codes defined on the surface to nearby 3D space.
% 2. 再描述该技术的forward process或者module design(先总结我们要做什么,然后写我们怎么做的:给定输入,经过xx步骤,得到输出。也就是"Given xxx, we first xxx, then xxx, finally xxx")
% 2.1 我们要做什么
%% 例子: Inspired by [65, 56, 49], we choose the SparseConvNet [21] to efficiently process the structured latent codes, whose architecture is described in Table 1.
% 我们怎么做的
% 2.2 we first do xx.
%% 例子: Specifically, based on the SMPL parameters, we compute the 3D bounding box of the human and divide the box into small voxels with voxel size of 5mm × 5mm × 5mm. The latent code of a nonempty voxel is the mean of latent codes of SMPL vertices inside this voxel.
% 2.3 then, we do xx.
%% 例子: SparseConvNet utilizes 3D sparse convolutions to process the input volume and output latent code volumes with 2×, 4×, 8×, 16× downsampled sizes. With the convolution and downsampling, the input codes are diffused to nearby space.
% 2.4 finally, we do xx.
%% 例子: Following [56], for any point in 3D space, we interpolate the latent codes from multi-scale code volumes of network layers 5, 9, 13, 17, and concatenate them into the final latent code. Since the code diffusion should not be affected by the human position and orientation in the world coordinate system, we transform the code locations to the SMPL coordinate system.
[1] Learning Research by Sida Peng from ZJU