summaryrefslogtreecommitdiff
path: root/docs/source/auto_examples/plot_otda_semi_supervised.ipynb
blob: e3192da2a91f9a913822d828e47891ae2ba130ce (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# OTDA unsupervised vs semi-supervised setting\n\n\nThis example introduces a semi supervised domain adaptation in a 2D setting.\nIt explicits the problem of semi supervised domain adaptation and introduces\nsome optimal transport approaches to solve it.\n\nQuantities such as optimal couplings, greater coupling coefficients and\ntransported samples are represented in order to give a visual understanding\nof what the transport methods are doing.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Remi Flamary <remi.flamary@unice.fr>\n#          Stanislas Chambon <stan.chambon@gmail.com>\n#\n# License: MIT License\n\nimport matplotlib.pylab as pl\nimport ot"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Generate data\n-------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "n_samples_source = 150\nn_samples_target = 150\n\nXs, ys = ot.datasets.make_data_classif('3gauss', n_samples_source)\nXt, yt = ot.datasets.make_data_classif('3gauss2', n_samples_target)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Transport source samples onto target samples\n--------------------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# unsupervised domain adaptation\not_sinkhorn_un = ot.da.SinkhornTransport(reg_e=1e-1)\not_sinkhorn_un.fit(Xs=Xs, Xt=Xt)\ntransp_Xs_sinkhorn_un = ot_sinkhorn_un.transform(Xs=Xs)\n\n# semi-supervised domain adaptation\not_sinkhorn_semi = ot.da.SinkhornTransport(reg_e=1e-1)\not_sinkhorn_semi.fit(Xs=Xs, Xt=Xt, ys=ys, yt=yt)\ntransp_Xs_sinkhorn_semi = ot_sinkhorn_semi.transform(Xs=Xs)\n\n# semi supervised DA uses available labaled target samples to modify the cost\n# matrix involved in the OT problem. The cost of transporting a source sample\n# of class A onto a target sample of class B != A is set to infinite, or a\n# very large value\n\n# note that in the present case we consider that all the target samples are\n# labeled. For daily applications, some target sample might not have labels,\n# in this case the element of yt corresponding to these samples should be\n# filled with -1.\n\n# Warning: we recall that -1 cannot be used as a class label"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Fig 1 : plots source and target samples + matrix of pairwise distance\n---------------------------------------------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pl.figure(1, figsize=(10, 10))\npl.subplot(2, 2, 1)\npl.scatter(Xs[:, 0], Xs[:, 1], c=ys, marker='+', label='Source samples')\npl.xticks([])\npl.yticks([])\npl.legend(loc=0)\npl.title('Source  samples')\n\npl.subplot(2, 2, 2)\npl.scatter(Xt[:, 0], Xt[:, 1], c=yt, marker='o', label='Target samples')\npl.xticks([])\npl.yticks([])\npl.legend(loc=0)\npl.title('Target samples')\n\npl.subplot(2, 2, 3)\npl.imshow(ot_sinkhorn_un.cost_, interpolation='nearest')\npl.xticks([])\npl.yticks([])\npl.title('Cost matrix - unsupervised DA')\n\npl.subplot(2, 2, 4)\npl.imshow(ot_sinkhorn_semi.cost_, interpolation='nearest')\npl.xticks([])\npl.yticks([])\npl.title('Cost matrix - semisupervised DA')\n\npl.tight_layout()\n\n# the optimal coupling in the semi-supervised DA case will exhibit \" shape\n# similar\" to the cost matrix, (block diagonal matrix)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Fig 2 : plots optimal couplings for the different methods\n---------------------------------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pl.figure(2, figsize=(8, 4))\n\npl.subplot(1, 2, 1)\npl.imshow(ot_sinkhorn_un.coupling_, interpolation='nearest')\npl.xticks([])\npl.yticks([])\npl.title('Optimal coupling\\nUnsupervised DA')\n\npl.subplot(1, 2, 2)\npl.imshow(ot_sinkhorn_semi.coupling_, interpolation='nearest')\npl.xticks([])\npl.yticks([])\npl.title('Optimal coupling\\nSemi-supervised DA')\n\npl.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Fig 3 : plot transported samples\n--------------------------------\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# display transported samples\npl.figure(4, figsize=(8, 4))\npl.subplot(1, 2, 1)\npl.scatter(Xt[:, 0], Xt[:, 1], c=yt, marker='o',\n           label='Target samples', alpha=0.5)\npl.scatter(transp_Xs_sinkhorn_un[:, 0], transp_Xs_sinkhorn_un[:, 1], c=ys,\n           marker='+', label='Transp samples', s=30)\npl.title('Transported samples\\nEmdTransport')\npl.legend(loc=0)\npl.xticks([])\npl.yticks([])\n\npl.subplot(1, 2, 2)\npl.scatter(Xt[:, 0], Xt[:, 1], c=yt, marker='o',\n           label='Target samples', alpha=0.5)\npl.scatter(transp_Xs_sinkhorn_semi[:, 0], transp_Xs_sinkhorn_semi[:, 1], c=ys,\n           marker='+', label='Transp samples', s=30)\npl.title('Transported samples\\nSinkhornTransport')\npl.xticks([])\npl.yticks([])\n\npl.tight_layout()\npl.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}