神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)

QuickQ加速器 7 2

  摘要

  转自:郑宇MSRA

  跨域数据融合全套PPT分章节全部公开(300MB+),算法结合案例,助力攻克数据挖掘和机器学习新难点,抢占大数据和人工智能制高点。

  1. Overview

  Traditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse the knowledge from various datasets organically in a machine learning and data mining task. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community.

神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)-第1张图片-QuickQ官网

  

  Figure 1. The difference between cross-domain data fusion and conventional data fusion

  This tutorial summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods.

  

神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)-第1张图片-QuickQ官网

  Figure 2 Categories of methods for cross-domain data fusion

  This tutorial does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this tutorial positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This tutorial will help a wide range of communities find a solution for data fusion in big data projects.

神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)-第3张图片-QuickQ官网

  2. The Stage-Based Data Fusion Methods

  This category of methods uses different datasets at the different stages of a data mining task. So, different datasets are loosely coupled, without any requirements on the consistency of their modalities. the stage-based data fusion methods can be a meta-approach used together with other data fusion methods. For example, Yuan et al. [3] first use road network data and taxi trajectories to build a region graph, and then propose a graphical model to fuse the information of POIs and the knowledge of the region graph. In the second stage, a probabilistic-graphical-model-based method is employed in the framework of the stage-based method.

神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)-第4张图片-QuickQ官网

  

  Figure 3. Illustration of the stage-based data fusion

  Examples:

  As illustrated in Fig. 3 A), Zheng et al. first partition a city into regions by major roads using a map segmen-tation method. The GPS trajectories of taxicabs are then mapped onto the regions to formulate a region graph, as depicted in Fig. 3 B), where a node is a region and an edge denotes the aggregation of commutes (by taxis in this case) between two regions. The region graph actually blends knowledge from the road net-work and taxi trajectories. By analyzing the region graph, a body of research has been carried out to identi-fy the improper design of a road network, detect and diagnose traffic anomalies as well as find urban functional regions.

  

  Figure 4. An example of using the stage-based method for data fusion

神雕侠侣手机版word(神雕侠侣手机电脑版免费下载)-第1张图片-QuickQ官网

  链接:

  https://www.microsoft.com/en-us/research/project/cross-domain-data-fusion/

  原文链接:

  https://weibo.com/2073091511/EDGDDyWNU?ref=home&rid=7_0_202_2778227504193407143&type=comment

  ↓↓↓

标签: 神雕侠侣手机版word

发表评论 (已有2条评论)

评论列表

2026-02-19 17:12:52

datasets from different sources in different domains. These datasets consist of multiple modalities, each

2026-02-19 19:30:49

tially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional d