Neural Networks & Deep Learning
2020-Spring, 2020-Fall, 2021-Fall, 2022-Fall
This is Dr. Gan Tian. She is currently an Associate Professor with the School of Computer Science and Technology, Shandong University.
Dr. Gan received her B.Sc. from East China Normal University in Jul. 2010. She then came to the School of Computing, National University of Singapore to pursue her PhD degree supervised by Prof. Mohan Kankanhalli from Aug. 2010 to Aug. 2015. Before coming back to China, She works at REVIVE, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR) from Aug. 2014 to Sep. 2016.
甘甜,山东大学计算机科学与技术学院副教授、博士生导师,教育部“青年长江学者”。其于2010年和2015年分别从华东师范大学和新加坡国立大学获得学士和博士学位,后任新加坡科技研究局资讯通信研究院(Institute for Infocomm Research, A*STAR)研究科学家(2014–2016)。担任CSIG多媒体专委会副秘书长,IJCAI 2023资深程序委员、ACMMM 2023领域主席、ICIMCS2019和ACMMM Asia 2020出版主席。主持国家自然科学青年基金项目、面上项目、科技创新2030-“新一代人工智能”重大项目子课题等国家级、省部级课题。在多个相关领域的国际顶级学术期刊及会议如 CVPR、ACM MM、ACL、AAAI、IEEE TIP、IEEE TMM等上发表多篇论文,获得2021年山东省科技进步一等奖、2023年山东省技术发明一等奖。
招收多媒体信息处理、计算机视觉和机器学习方向的硕士研究生和博士研究生
2024年秋季入学学术硕士(尚有1个名额)、2025年入学学术硕士(尚有1个名额)、2025年入学博士(尚有1个名额),欢迎联系!(更新,2024年5月21日)
联系:gantian@sdu.edu.cn
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
With rising awareness of environment protection and recycling, second-hand trading platforms have attracted increasing attention in recent years. The interaction data on second-hand trading platforms, consisting of sufficient interactions per user but rare interactions per item, is different from what they are on traditional platforms. Therefore, building successful recommendation systems in the second-hand trading platforms requires balancing modeling items? and users? preference, and mitigating the adverse effects of the sparsity, which makes recommendation especially challenging. Accordingly, we proposed a method to simultaneously learn representations of items and users from coarse-grained and fine-grained features, and a multi-task learning strategy is designed to address the issue of data sparsity. Experiments conducted on a real-world second-hand trading platform dataset demonstrate the effectiveness of our proposed model.
Recommending new items in real-world e-commerce portals is a challenging problem as the cold start phenomenon, i.e., lacks of user-item interactions. To address this problem, we propose a novel recommendation model, i.e., adversarial neural network with multiple generators, to generate users from multiple perspectives of items' attributes. Namely, the generated users are represented by attribute-level features. As both users and items are attribute-level representations, we can implicitly obtain user-item attribute-level interaction information. In light of this, the new item can be recommended to users based on attribute-level similarity. Extensive experimental results on two item cold-start scenarios, movie and goods recommendation, verify the effectiveness of our proposed model as compared to state-of-the-art baselines.
Hashtags, a user provides to a micro-video, are the ones which can well describe the semantics of the micro-video's content in his/her mind. At the same time, hashtags have been widely used to facilitate various micro-video retrieval scenarios (e.g., search, browse, and categorization). Despite their importance, numerous micro-videos lack hashtags or contain inaccurate or incomplete hashtags. In light of this, hashtag recommendation, which suggests a list of hashtags to a user when he/she wants to annotate a post, becomes a crucial research problem. However, little attention has been paid to micro-video hashtag recommendation, mainly due to the following three reasons: 1) lack of benchmark dataset; 2) the temporal and multi-modality characteristics of micro-videos; and 3) hashtag sparsity and long-tail distributions. In this paper, we recommend hashtags for micro-videos by presenting a novel multi-view representation interactive embedding model with graph-based information propagation. It is capable of boosting the performance of micro-videos hashtag recommendation by jointly considering the sequential feature learning, the video-user-hashtag interaction, and the hashtag correlations. Extensive experiments on a constructed dataset demonstrate our proposed method outperforms state-of-the-art baselines. As a side research contribution, we have released our dataset and codes to facilitate the research in this community.
What made you want to wear the clothes you are wearing? Where is the place you want to visit for your next-coming holiday? Why do you like the music you frequently listen to? If you are like most people, you probably made these decisions as a result of watching influencers on social media. Furthermore, influencer marketing is an opportunity for brands to take advantage of social media using a well-defined and well-designed social media marketing strategy. However, choosing the right influencers is not an easy task. With more people gaining an increasing number of followers in social media, finding the right influencer for an E-commerce company becomes paramount. In fact, most marketers cite it as a top challenge for their brands. To address the aforementioned issues, we proposed a data-driven micro-influencer ranking scheme to solve the essential question of finding out the right micro-influencer. Specifically, we represented brands and influencers by fusing their historical posts' visual and textual information. A novel k-buckets sampling strategy with a modified listwise learning to rank model were proposed to learn a brand-micro-influncer scoring function. In addition, we developed a new Instagram brand micro-influencer dataset, consisting of 360 brands and 3,748 micro-influencers, which can benefit future researchers in this area. The extensive evaluations demonstrate the advantage of our proposed method compared with the state-of-the-art methods.
Presentation has been an effective method for delivering information to an audience for many years. Over the past few decades, technological advancements have revolutionized the way humans deliver presentation. Conventionally, the quality of a presentation is usually evaluated through painstaking manual analysis with experts. Although the expert feedback is effective in assisting users to improve their presentation skills, manual evaluation suffers from high cost and is often not available to most individuals. In this work, we propose a novel multi-sensor self-quantification system for presentations, which is designed based on a new proposed assessment rubric. We present our analytics model with conventional ambient sensors (i.e., static cameras and Kinect sensor) and the emerging wearable egocentric sensors (i.e., Google Glass). In addition, we performed a cross-correlation analysis of speaker’s vocal behavior and body language. The proposed framework is evaluated on a new presentation dataset, namely, NUS Multi-Sensor Presentation dataset, which consists of 51 presentations covering a diverse range of topics. To validate the efficacy of the proposed system, we have conducted a series of user studies with the speakers and an interview with an English communication expert, which reveals positive and promising feedback.
GWI survey has highlighted the flourishing use of multiple social networks: the average number of social media accounts per Internet user is 5.54, and among them, 2.82 are being used actively. Indeed, users tend to express their views in more than one social media site. Hence, merging social signals of the same user across different social networks together, if available, can facilitate the downstream analyses. Previous work has paid little attention on modeling the cooperation among the following factors when fusing data from multiple social networks: 1) as data from different sources characterizes the characteristics of the same social user, the source consistency merits our attention; 2) due to their different functional emphases, some aspects of the same user captured by different social networks can be just complementary and results in the source complementarity; and 3) different sources can contribute differently to the user characterization and hence lead to the different source confidence. Toward this end, we propose a novel unified model, which co-regularizes source consistency, complementarity, and confidence to boost the learning performance with multiple social networks. In addition, we derived its theoretical solution and verified the model with the real-world application of user interest inference. Extensive experiments over several state-of-the-art competitors have justified the superiority of our model.
Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.
Social working memory (SWM) plays an important role in navigating social interactions. Inspired by studies in psychology, neuroscience, cognitive science, and machine learning, we propose a probabilistic model of SWM to mimic human social intelligence for personal information retrieval (IR) in social interactions. First, we establish a semantic hierarchy as social long-term memory to encode personal information. Next, we propose a semantic Bayesian network as the SWM, which integrates the cognitive functions of accessibility and self-regulation. One subgraphical model implements the accessibility function to learn the social consensus about IR-based on social information concept, clustering, social context, and similarity between persons. Beyond accessibility, one more layer is added to simulate the function of self-regulation to perform the personal adaptation to the consensus based on human personality. Two learning algorithms are proposed to train the probabilistic SWM model on a raw dataset of high uncertainty and incompleteness. One is an efficient learning algorithm of Newton's method, and the other is a genetic algorithm. Systematic evaluations show that the proposed SWM model is able to learn human social intelligence effectively and outperforms the baseline Bayesian cognitive model. Toward real-world applications, we implement our model on Google Glass as a wearable assistant for social interaction.
hey report it. However, the information from the social sensors (like Facebook, Twitter, Instagram) typically is in the form of multimedia (text, image, video, etc.), thus coping with such information and mining useful knowledge from it will be an increasingly difficult task. In this paper, we first crawl social video data (text and video) from social sensor Twitter for a specific social event. Then the textual, acoustic, and visual features are extracted from these data. At last, we classify the social videos into subjective and objective videos by merging these different modality affective information. Preliminary experiments show that our proposed method is able to accurately classify the social sensor data’s subjectivity.
A wearable system is designed to provide directional information to visually impaired people. It consists of a mobile phone and haptic shoes. The former serves as the perceptual and control unit that generates directional instructions. Upon receiving the instructions, the shoes combine them with the user's walking status to produce unique vibration patterns. To enable effective direction sensing, a few alternative configurations are proposed, whereby the position and strength of vibrations are modulated programmatically. The prototype system is evaluated in a usability test with 60 subjects. By comparing different configurations, it is shown that the system achieves varying levels of perception accuracy under various walking conditions, and that the proposed design is advantageous to the benchmarking configuration. The system has great potential to provide smart and sensible navigation guidance to visually impaired people especially when integrated with visual processing units.
Wearable vision brings about new opportunities for augmenting humans in social interactions. However, along with it comes privacy concerns and possible information overload. We explore users' needs and attitudes toward augmented interaction in face-to-face communications. In particular, we want to find out whether users need additional information when interacting with acquaintances, what information they want to access, and how they use it. Based on observations of user behaviors in interactions assisted by Google Glass, we find that users in general appreciated the usefulness of wearable assistance for social interactions. We highlight a few key issues of how wearable devices affect user experience in social interaction.
Presentations have been an effective means of delivering information to groups for ages. Over the past few decades, technological advancements have revolutionized the way humans deliver presentations. Despite that, the quality of presentations can be varied and affected by a variety of reasons. Conventional presentation evaluation usually requires painstaking manual analysis by experts. Although the expert feedback can definitely assist users in improving their presentation skills, manual evaluation suffers from high cost and is often not accessible to most people. In this work, we propose a novel multi-sensor self-quantification framework for presentations. Utilizing conventional ambient sensors (i.e., static cameras, Kinect sensor) and the emerging wearable egocentric sensors (i.e., Google Glass), we first analyze the efficacy of each type of sensor with various nonverbal assessment rubrics, which is followed by our proposed multi-sensor presentation analytics framework. The proposed framework is evaluated on a new presentation dataset, namely NUS Multi-Sensor Presentation (NUSMSP) dataset, which consists of 51 presentations covering a diverse set of topics. The dataset was recorded with ambient static cameras, Kinect sensor, and Google Glass. In addition to multi-sensor analytics, we have conducted a user study with the speakers to verify the effectiveness of our system generated analytics, which has received positive and promising feedback.
In a typical multi-person social interaction, spatial information plays an important role in analyzing the structure of the social interaction. Previous studies, which analyze spatial structure of the social interaction using one or more third-person view cameras, suffer from the occlusion problem. With the increasing popularity of wearable computing devices, we are now able to obtain natural first-person observations with limited occlusion. However, such observations have a limited field of view, and can only capture a portion of the social interaction. To overcome the aforementioned limitation, we propose a search-based structure recovery method in a small group conversational social interaction scenario to reconstruct the social interaction structure from multiple first-person views, where each of them contributes to the multifaceted understanding of the social interaction. We first map each first-person view to a local coordinate system, then a set of constraints and spatial relationships are extracted from these local coordinate systems. Finally, the human spatial configuration is searched under the constraints to "best" match the extracted relationships. The proposed method is much simpler than full 3D reconstruction, and suffices for capturing the social interaction spatial structure. Experiments for both simulated and real-world data show the efficacy of the proposed method.
In the context of a social gathering, such as a cocktail party, the memorable moments are generally captured by professional photographers or by the participants. The latter case is often undesirable because many participants would rather enjoy the event instead of being occupied by the photo-taking task. Motivated by this scenario, we propose the use of a set of cameras to automatically take photos. Instead of performing dense analysis on all cameras for photo capturing, we first detect the occurrence and location of social interactions via F-formation detection. In the sociology literature, F-formation is a concept used to define social interactions, where each detection only requires the spatial location and orientation of each participant. This information can be robustly obtained with additional Kinect depth sensors. In this paper, we propose an extended F-formation system for robust detection of interactions and interactants. The extended F-formation system employs a heat-map based feature representation for each individual, namely Interaction Space (IS), to model their location, orientation, and temporal information. Using the temporally encoded IS for each detected interactant, we propose a best-view camera selection framework to detect the corresponding best view camera for each detected social interaction. The extended F-formation system is evaluated with synthetic data on multiple scenarios. To demonstrate the effectiveness of the proposed system, we conducted a user study to compare our best view camera ranking with human's ranking using real-world data.
In the context of a social gathering, such as a cocktail party, the memorable moments are often captured by professional photographers or the participants. The latter case is generally undesirable because many participants would rather enjoy the event instead of being occupied by the tedious photo capturing task. Motivated by this scenario, we propose an automated social event photo-capture framework for which, given the multiple sensor data streams and the information from the Web as input, will output the visually appealing photos of the social event. Our proposal consists of three components: (1) social attribute extraction from both the physical space and the cyberspace; (2) social attribute fusion; and (3) active camera control. Current work is presented and we conclude with expected contributions as well as future direction.
I would be happy to talk to you if you need my assistance in your research. My Office is at N3-422-2, No. 72 Binhai Road, Qingdao, Shandong, China 266237. (山东省青岛市即墨滨海路72号山东大学青岛校区,266237)