Quantcast
Channel: Oslandia
Viewing all 142 articles
Browse latest View live

OSM user classification: let’s use machine learning!

0
0

At Oslandia, we like working with Open Source tool projects and handling Open (geospatial) Data. In this article series, we will play with the OpenStreetMap (OSM) map and subsequent data. Here comes the seventh article of this series, dedicated to user classification using the power of machine learning algorithms.

1 Develop a Principle Component Analysis (PCA)

From now we can try to add some intelligence into the data by using well-known machine learning tools.

Reduce the dimensionality of a problem often appears as a unavoidable pre-requisite before undertaking any classification effort. As developed in the previous blog post, we have 40 normalized variables. That seems quite small for implementing a PCA (we could apply directly a clustering algorithm on our normalized data); however for a sake of clarity regarding result interpretation, we decide to add this step into the analysis.

1.1 PCA design

Summarize the complete user table by just a few synthetic components is appealing; however you certainly want to ask “how many components?”! The principle component analysis is a linear projection of individuals on a smaller dimension space. It provides uncorrelated components, dropping redundant information given by subsets of the initial dataset.

Actually there is no ideal component number, it can depend on modeller wishes; however in general this quantity is chosen according to the explained variance proportion, and/or according to eigen values of components. There are some rules of thumbs for such a situation: we can choose to take components to cover at least 70% of the variance, or to consider components with a larger-than-1 eigen value.

cov_mat = np.cov(X.T)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
eig_vals = sorted(eig_vals, reverse=True)
tot = sum(eig_vals)
varexp = [(i/tot)*100 for i in eig_vals]
cumvarexp = np.cumsum(varexp)
varmat = pd.DataFrame({'eig': eig_vals,
                       'varexp': varexp,
                       'cumvar': cumvarexp})[['eig','varexp','cumvar']]
f, ax = plt.subplots(1, 2, figsize=(12,6))
ax[0].bar(range(1,1+len(varmat)), varmat['varexp'].values, alpha=0.25, 
        align='center', label='individual explained variance', color = 'g')
ax[0].step(range(1,1+len(varmat)), varmat['cumvar'].values, where='mid',
         label='cumulative explained variance')
ax[0].axhline(70, color="blue", linestyle="dotted")
ax[0].legend(loc='best')
ax[1].bar(range(1,1+len(varmat)), varmat['eig'].values, alpha=0.25,
          align='center', label='eigenvalues', color='r')
ax[1].axhline(1, color="red", linestyle="dotted")
ax[1].legend(loc="best")

Here the second rule of thumb fails, as we do not use a standard scaling process (e.g. less mean, divided by standard deviation), however the first one makes us consider 6 components (that explain around 72% of the total variance). The exact figures can be checked in the varmat data frame:

varmat.head(6)
        eig     varexp     cumvar
0  1.084392  28.527196  28.527196
1  0.551519  14.508857  43.036053
2  0.346005   9.102373  52.138426
3  0.331242   8.714022  60.852448
4  0.261060   6.867738  67.720186
5  0.181339   4.770501  72.490687

1.2 PCA running

The PCA algorithm is loaded from a sklearn module, we just have to run it by giving a number of components as a parameter, and to apply the fit_transform procedure to get the new linear projection. Moreover the contribution of each feature to the new components is straightforwardly accessible with the sklearn API.

from sklearn.decomposition import PCA
model = PCA(n_components=6)
Xpca = model.fit_transform(X)
pca_cols = ['PC' + str(i+1) for i in range(6)]
pca_ind = pd.DataFrame(Xpca, columns=pca_cols, index=user_md.index)
pca_var = pd.DataFrame(model.components_, index=pca_cols,
                       columns=user_md.columns).T
pca_ind.query("uid == 24664").T
uid     24664
PC1 -0.358475
PC2  1.671157
PC3  0.121600
PC4 -0.139412
PC5 -0.983175
PC6  0.409832

Oh yeah, after running the PCA, the information about the user is summarized with these 6 cryptic values. I’m pretty sure you want to know which meaning these 6 components have…

1.3 Component interpretation

By taking advantage of seaborn (a Python visualization library based on Matplotlib), we can plot the feature contributions to each component. All these contributions are comprised between -1 (a strong negative contribution) and 1 (a strong positive contribution). Additionnally there is a mathematical link between all contributions to a given component: the sum of squares equals to 1! As a consequence the features can be ranked by order of importance in the component definition.

f, ax = plt.subplots(figsize=(12,12))
sns.heatmap(pca_var, annot=True, fmt='.3f', ax=ax)
plt.yticks(rotation=0)
plt.tight_layout()
sns.set_context('paper')

Here our six components may be described as follows:

  • PC1 (28.5% of total variance) is really impacted by relation modifications, this component will be high if user did a lot of relation improvements (and very few node and way modifications), and if these improvements have been corrected by other users since. It is the sign of a specialization on complex structures. This component also refers to contributions from foreign users (i.e. not from the area of interest, here the Bordeaux area), familiar with JOSM.
  • PC2 (14.5% of total variance) characterizes how experienced and versatile are users: this component will be high for users with a high number of activity days, a lot of local as well as total changesets, and high numbers of node, way and relation modifications. This second component highlights JOSM too.
  • PC3 (9.1% of total variance) describes way-focused contributions by old users (but not really productive since their inscription). A high value is synonymous of corrected contributions, however that’s quite mechanical: if you contributed a long time ago, your modifications would probably not be up-to-date any more. This component highlights Potlatch and JOSM as the most used editors.
  • PC4 (8.7% of total variance) looks like PC3, in the sense that it is strongly correlated with way modifications. However it will concern newer users: a more recent inscription date, contributions that are less corrected, and more often up-to-date. As the preferred editor, this component is associated with iD.
  • PC5 (6.9% of total variance) refers to a node specialization, from very productive users. The associated modifications are overall improvements that are still up-to-date. However, PC5 is linked with users that are not at ease in our area of interest, even if they produced a lot of changesets elsewhere. JOSM is clearly the corresponding editor.
  • PC6 (4.8% of total variance) is strongly impacted by node improvements, by opposition to node creations (a similar behavior tends to emerge for ways). This less important component highlights local specialists: a fairly high quantity of local changesets, but a small total changeset quantity. Like for PC4, the editor used for such contributions is iD.

1.4 Describe individuals positioning after dimensionality reduction

As a recall, we can print the previous user characteristics:

pca_ind.query("uid == 24664").T
uid     24664
PC1 -0.358475
PC2  1.671157
PC3  0.121600
PC4 -0.139412
PC5 -0.983175
PC6  0.409832

From the previous lightings, we can conclude that this user is really experienced (high value of PC2), even if this experience tends to be local (high negative value for PC5). The fairly good value for PC6 enforces the hypothesis credibility.

From the different component values, we can imagine that the user is versatile; there is no strong trend to characterize its specialty. The node creation activity seems high, even if the last component shades a bit the conclusion.

Regarding the editors this contributor used, the answer is quite hard to provide only by considering the six components! JOSM is favored by PC2, but handicaped by PC1 and PC5; that is the contrary with iD; Potlatch is the best candidate as it is favored by PC3, PC4 and PC5.

By the way, this interpretation exercise may look quite abstract, but just consider the description at the beginning of the post, and compare it with this interpretation… It is not so bad, isn’t it?

2 Cluster the user starting from their past activity

At this point, we have a set of active users (those who have contributed to the focused area). We propose now to classify each of them without any knowledge on their identity or experience with geospatial data or OSM API, by the way of unsupervised learning. Indeed we will design clusters with the k-means algorithm, and the only inputs we have are the synthetic dimensions given by the previous PCA.

2.1 k-means design: how many cluster may we expect from the OSM metadata?

Like for the PCA, the k-means algorithm is characterized by a parameter that we must tune, i.e. the cluster number.

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

scores = []
silhouette = []
for i in range(1, 11):
    model = KMeans(n_clusters=i, n_init=100, max_iter=1000)
    Xclust = model.fit_predict(Xpca)
    scores.append(model.inertia_)
    if i == 1:
        continue
    else:
        silhouette.append(silhouette_score(X=Xpca, labels=Xclust))

f, ax = plt.subplots(1, 2, figsize=(12,6))
ax[0].plot(range(1,11), scores, linewidth=3)
ax[0].set_xlabel("Number of clusters")
ax[0].set_ylabel("Unexplained variance")
ax[1].plot(range(2,11), silhouette, linewidth=3, color='g')
ax[1].set_xlabel("Number of clusters")
ax[1].set_ylabel("Silhouette")
ax[1].set_xlim(1, 10)
ax[1].set_ylim(0.2, 0.5)
plt.tight_layout()
sns.set_context('paper')

How many clusters can be identified? We only have access to soft recommendations given by state-of-the-art procedures. As an illustration here, we use elbow method, and clustering silhouette.

The former represents the intra-cluster variance, i.e. the sparsity of observations within clusters. It obviously decreases when the cluster number increases. To keep the model simple and do not overfit the data, this quantity has to be as small as possible. That’s why we evoke an “elbow”: we are looking for a bending point designing a drop of the explained variance marginal gain.

The latter is a synthetic metric that indicates how well each individuals is represented by its cluster. It is comprised between 0 (bad clustering representation) and 1 (perfect clustering).

The first criterion suggests to take either 2 or 6 clusters, whilst the second criterion is larger with 6 or 7 clusters. We then decide to take on 6 clusters.

2.2 k-means running: OSM contributor classification

We hypothesized that several kinds of users would have been highlighted by the clustering process. How to interpret the six chosen clusters starting from the Bordeaux area dataset?

model = KMeans(n_clusters=6, n_init=100, max_iter=1000)
kmeans_ind = pca_ind.copy()
kmeans_ind['Xclust'] = model.fit_predict(pca_ind.values)
kmeans_centroids = pd.DataFrame(model.cluster_centers_,
                                columns=pca_ind.columns)
kmeans_centroids['n_individuals'] = (kmeans_ind
                                     .groupby('Xclust')
                                     .count())['PC1']
kmeans_centroids
        PC1       PC2       PC3       PC4       PC5       PC6  n_individuals
0 -0.109548  1.321479  0.081620  0.010547  0.117813 -0.024774            317
1  1.509024 -0.137856 -0.142927  0.032830 -0.120925 -0.031677            585
2 -0.451754 -0.681200 -0.269514 -0.763636  0.258083  0.254124            318
3 -0.901269  0.034718  0.594161 -0.395605 -0.323108 -0.167279            272
4 -1.077956  0.027944 -0.595763  0.365220 -0.005816 -0.022345            353
5 -0.345311 -0.618198  0.842705  0.872673  0.180977 -0.004558            228

The k-means algorithm makes six relatively well-balanced groups (the group 1 is larger than the others, however the difference is not so high):

  • Group 0 (15.3% of users): this cluster represents most experienced and versatile users. The users are seen as OSM key contributors.
  • Group 1 (28.2% of users): this group refers to relation specialists, users that are fairly productive on OSM.
  • Group 2 (15.3% of users): this cluster gathers very unexperienced users, that comes just a few times on OSM to modify mostly nodes.
  • Group 3 (13.2% of users): this category refers to old one-shot contributors, mainly interested in way modifications.
  • Group 4 (17.0% of users): this cluster of user is very close to the previous one, the difference being the more recent period during which they have contributed.
  • Group 5 (11.0% of users): this last user cluster contains contributors that are locally unexperienced, they have proposed mainly way modifications.

To complete this overview, we can plot individuals according to their group:

SUBPLOT_LAYERS = pd.DataFrame({'x':[0,2,4],
                               'y':[1,3,5]})
f, ax = plt.subplots(1, 3, figsize=(12,4))
for i in range(3):
    ax_ = ax[i]
    comp = SUBPLOT_LAYERS.iloc[i][['x', 'y']]
    x_column = 'PC'+str(1+comp[0])
    y_column = 'PC'+str(1+comp[1])
    for name, group in kmeans_ind.groupby('Xclust'):
        ax_.plot(group[x_column], group[y_column], marker='.',
                 linestyle='', ms=10, label=name)
        if i == 0:
            ax_.legend(loc=0)
    ax_.plot(kmeans_centroids[[x_column]],
             kmeans_centroids[[y_column]],
             'kD', markersize=10)
    for i, point in kmeans_centroids.iterrows():
        ax_.text(point[x_column]-0.2, point[y_column]-0.2,
                 ('C'+str(i)+' (n='
                  +str(int(point['n_individuals']))+')'),
                  weight='bold', fontsize=14)
    ax_.set_xlabel(x_column + ' ({:.2f}%)'.format(varexp[comp[0]]))
    ax_.set_ylabel(y_column + ' ({:.2f}%)'.format(varexp[comp[1]]))
plt.tight_layout()

It appears that the first two components allow to discriminate clearly C0 and C1. We need the third and the fourth components to differentiate C2 and C5 on the first hand, and C3 and C4 on the other hand. The last two components do not provide any additional information.

3 Conclusion

“Voilà”! We have proposed here a user classification, without any preliminar knowledge about who they are, and which skills they have. That’s an illustration of the power of unsupervised learning; we will try to apply this clustering in OSM data quality assessment in a next blog post, dedicated to mapping!


OSM data quality assessment: producing map to illustrate data quality

0
0

At Oslandia, we like working with Open Source tool projects and handling Open (geospatial) Data. In this article series, we will play with the OpenStreetMap (OSM) map and subsequent data. Here comes the eighth article of this series, dedicated to the OSM data quality evaluation, through production of new maps.

1 Description of OSM element

 1.1 Element metadata extraction

As mentionned in a previous article dedicated to metadata extraction, we have to focus on element metadata itself if we want to produce valuable information about quality. The first questions to answer here are straightforward: what is an OSM element? and how to extract its associated metadata?. This part is relatively similar to the job already done with users.

We know from previous analysis that an element is created during a changeset by a given contributor, may be modified several times by whoever, and may be deleted as well. This kind of object may be either a “node”, a “way” or a “relation”. We also know that there may be a set of different tags associated with the element. Of course the list of every operations associated to each element is recorded in the OSM data history. Let’s consider data around Bordeaux, as in previous blog posts:

import pandas as pd
elements = pd.read_table('../src/data/output-extracts/bordeaux-metropole/bordeaux-metropole-elements.csv', parse_dates=['ts'], index_col=0, sep=",")
elements.head().T
   elem        id  version  visible         ts    uid  chgset
0  node  21457126        2    False 2008-01-17  24281  653744
1  node  21457126        3    False 2008-01-17  24281  653744
2  node  21457126        4    False 2008-01-17  24281  653744
3  node  21457126        5    False 2008-01-17  24281  653744
4  node  21457126        6    False 2008-01-17  24281  653744

This short description helps us to identify some basic features, which are built in the following snippets. First we recover the temporal features:

elem_md = (elements.groupby(['elem', 'id'])['ts']
            .agg(["min", "max"])
            .reset_index())
elem_md.columns = ['elem', 'id', 'first_at', 'last_at']
elem_md['lifespan'] = (elem_md.last_at - elem_md.first_at)/pd.Timedelta('1D')
extraction_date = elements.ts.max()
elem_md['n_days_since_creation'] = ((extraction_date - elem_md.first_at)
                                  / pd.Timedelta('1d'))
elem_md['n_days_of_activity'] = (elements
                              .groupby(['elem', 'id'])['ts']
                              .nunique()
                              .reset_index())['ts']
elem_md = elem_md.sort_values(by=['first_at'])
                                    213418
elem                                  node
id                               922827508
first_at               2010-09-23 00:00:00
last_at                2010-09-23 00:00:00
lifespan                                 0
n_days_since_creation                 2341
n_days_of_activity                       1

Then the remainder of the variables, e.g. how many versions, contributors, changesets per elements:

    elem_md['version'] = (elements.groupby(['elem','id'])['version']
                          .max()
                          .reset_index())['version']
    elem_md['n_chgset'] = (elements.groupby(['elem', 'id'])['chgset']
                           .nunique()
                           .reset_index())['chgset']
    elem_md['n_user'] = (elements.groupby(['elem', 'id'])['uid']
                         .nunique()
                         .reset_index())['uid']
    osmelem_last_user = (elements
                         .groupby(['elem','id'])['uid']
                         .last()
                         .reset_index())
    osmelem_last_user = osmelem_last_user.rename(columns={'uid':'last_uid'})
    elements = pd.merge(elements, osmelem_last_user,
                       on=['elem', 'id'])
    elem_md = pd.merge(elem_md,
                       elements[['elem', 'id', 'version', 'visible', 'last_uid']],
                       on=['elem', 'id', 'version'])
    elem_md = elem_md.set_index(['elem', 'id'])
    elem_md.sample().T
elem                                  node
id                              1340445266
first_at               2011-06-26 00:00:00
last_at                2011-06-27 00:00:00
lifespan                                 1
n_days_since_creation                 2065
n_days_of_activity                       2
version                                  2
n_chgset                                 2
n_user                                   1
visible                              False
last_uid                            354363

As an illustration we have above an old two-versionned node, no more visible on the OSM website.

1.2 Characterize OSM elements with user classification

This set of features is only descriptive, we have to add more information to be able to characterize OSM data quality. That is the moment to exploit the user classification produced in the last blog post!

As a recall, we hypothesized that clustering the users permits to evaluate their trustworthiness as OSM contributors. They are either beginners, or intermediate users, or even OSM experts, according to previous classification.

Each OSM entity may have received one or more contributions by users of each group. Let’s say the entity quality is good if its last contributor is experienced. That leads us to classify the OSM entities themselves in return!

How to include this information into element metadata?

We first need to recover the results of our clustering process.

user_groups = pd.read_hdf("../src/data/output-extracts/bordeaux-metropole/bordeaux-metropole-user-kmeans.h5", "/individuals")
user_groups.head()
           PC1       PC2       PC3       PC4       PC5       PC6  Xclust
uid                                                                     
1626 -0.035154  1.607427  0.399929 -0.808851 -0.152308 -0.753506       2
1399 -0.295486 -0.743364  0.149797 -1.252119  0.128276 -0.292328       0
2488  0.003268  1.073443  0.738236 -0.534716 -0.489454 -0.333533       2
5657 -0.889706  0.986024  0.442302 -1.046582 -0.118883 -0.408223       4
3980 -0.115455 -0.373598  0.906908  0.252670  0.207824 -0.575960       5

As a remark, there were several important results to save after the clustering process; we decided to serialize them into a single binary file. Pandas knows how to manage such file, that would be a pity not to take advantage of it!

We recover the individuals groups in the eponym binary file tab (column Xclust), and only have to join it to element metadata as follows:

elem_md = elem_md.join(user_groups.Xclust, on='last_uid')
elem_md = elem_md.rename(columns={'Xclust':'last_uid_group'})
elem_md.reset_index().to_csv("../src/data/output-extracts/bordeaux-metropole/bordeaux-metropole-element-metadata.csv")
elem_md.sample().T
elem                                  node
id                              1530907753
first_at               2011-12-04 00:00:00
last_at                2011-12-04 00:00:00
lifespan                                 0
n_days_since_creation                 1904
n_days_of_activity                       1
version                                  1
n_chgset                                 1
n_user                                   1
visible                               True
last_uid                             37548
last_uid_group                           2

From now, we can use the last contributor cluster as an additional information to generate maps, so as to study data quality…

Wait… There miss another information, isn’t it? Well yes, maybe the most important one, when dealing with geospatial data: the location itself!

1.3 Recover the geometry information

Even if Pyosmium library is able to retrieve OSM element geometries, we realized some tests with an other OSM data parser here: osm2pgsql.

We can recover geometries from standard OSM data with this tool, by assuming the existence of an osm database, owned by user:

osm2pgsql -E 27572 -d osm -U user -p bordeaux_metropole --hstore ../src/data/raw/bordeaux-metropole.osm.pbf

We specify a France-focused SRID (27572), and a prefix for naming output databases point, line, polygon and roads.

We can work with the line subset, that contains the physical roads, among other structures (it roughly corresponds to the OSM ways), and build an enriched version of element metadata, with geometries.

First we can create the table bordeaux_metropole_geomelements, that will contain our metadata…

DROP TABLE IF EXISTS bordeaux_metropole_elements;
DROP TABLE IF EXISTS bordeaux_metropole_geomelements;
CREATE TABLE bordeaux_metropole_elements(
       id int,
       elem varchar,
       osm_id bigint,
       first_at varchar,
       last_at varchar,
       lifespan float,
       n_days_since_creation float,
       n_days_of_activity float,
       version int,
       n_chgsets int,
       n_users int,
       visible boolean,
       last_uid int,
       last_user_group int
);

…then, populate it with the data accurate .csv file…

COPY bordeaux_metropole_elements
FROM '/home/rde/data/osm-history/output-extracts/bordeaux-metropole/bordeaux-metropole-element-metadata.csv'
WITH(FORMAT CSV, HEADER, QUOTE '"');

…and finally, merge the metadata with the data gathered with osm2pgsql, that contains geometries.

SELECT l.osm_id, h.lifespan, h.n_days_since_creation,
h.version, h.visible, h.n_users, h.n_chgsets,
h.last_user_group, l.way AS geom
INTO bordeaux_metropole_geomelements
FROM bordeaux_metropole_elements as h
INNER JOIN bordeaux_metropole_line as l
ON h.osm_id = l.osm_id AND h.version = l.osm_version
WHERE l.highway IS NOT NULL AND h.elem = 'way'
ORDER BY l.osm_id;

Wow, this is wonderful, we have everything we need in order to produce new maps, so let’s do it!

2 Keep it visual, man!

From the last developments and some hypothesis about element quality, we are able to produce some customized maps. If each OSM entities (e.g. roads) can be characterized, then we can draw quality maps by highlighting the most trustworthy entities, as well as those with which we have to stay cautious.

In this post we will continue to focus on roads within the Bordeaux area. The different maps will be produced with the help of Qgis.

2.1 First step: simple metadata plotting

As a first insight on OSM elements, we can plot each OSM ways regarding simple features like the number of users who have contributed, the number of version or the element anteriority.

Figure 1: Number of active contributors per OSM way in Bordeaux

 

Figure 2: Number of versions per OSM way in Bordeaux

With the first two maps, we see that the ring around Bordeaux is the most intensively modified part of the road network: more unique contributors are implied in the way completion, and more versions are designed for each element. Some major roads within the city center present the same characteristics.

Figure 3: Anteriority of each OSM way in Bordeaux, in years

If we consider the anteriority of OSM roads, we have a different but interesting insight of the area. The oldest roads are mainly located within the city center, even if there are some exceptions. It is also interesting to notice that some spatial patterns arise with temporality: entire neighborhoods are mapped within the same anteriority.

2.2 More complex: OSM data merging with alternative geospatial representations

To go deeper into the mapping analysis, we can use the INSEE carroyed data, that divides France into 200-meter squared tiles. As a corollary OSM element statistics may be aggregated into each tile, to produce additional maps. Unfortunately an information loss will occur, as such tiles are only defined where people lives. However it can provides an interesting alternative illustration.

To exploit such new data set, we have to merge the previous table with the accurate INSEE table. Creating indexes on them is of great interest before running such a merging operation:

CREATE INDEX insee_geom_gist
ON open_data.insee_200_carreau USING GIST(wkb_geometry);
CREATE INDEX osm_geom_gist
ON bordeaux_metropole_geomelements USING GIST(geom);

DROP TABLE IF EXISTS bordeaux_metropole_carroyed_ways;
CREATE TABLE bordeaux_metropole_carroyed_ways AS (
SELECT insee.ogc_fid, count(*) AS nb_ways,
avg(bm.version) AS avg_version, avg(bm.lifespan) AS avg_lifespan,
avg(bm.n_days_since_creation) AS avg_anteriority,
avg(bm.n_users) AS avg_n_users, avg(bm.n_chgsets) AS avg_n_chgsets,
insee.wkb_geometry AS geom
FROM open_data.insee_200_carreau AS insee
JOIN bordeaux_metropole_geomelements AS bm
ON ST_Intersects(insee.wkb_geometry, bm.geom)
GROUP BY insee.ogc_fid
);

As a consequence, we get only 5468 individuals (tiles), a quantity that must be compared to the 29427 roads previously handled… This operation will also simplify the map analysis!

We can propose another version of previous maps by using Qgis, let’s consider the average number of contributors per OSM roads, for each tile:

Figure 4: Number of contributors per OSM roads, aggregated by INSEE tile

2.3 The cherry on the cake: representation of OSM elements with respect to quality

Last but not least, the information about last user cluster can shed some light on OSM data quality: by plotting each roads according to the last user who has contributed, we might identify questionable OSM elements!

We simply have to design similar map than in previous section, with user classification information:

Figure 5: OSM roads around Bordeaux, according to the last user cluster (1: C1, relation experts; 2: C0, versatile expert contributors; 3: C4, recent one-shot way contributors; 4: C3, old one-shot way contributors; 5: C5, locally-unexperienced way specialists)

According to the clustering done in the previous article (be careful, the legend is not the same here…), we can make some additional hypothesis:

  • Light-blue roads are OK, they correspond to the most trustful cluster of contributors (91.4% of roads in this example)
  • There is no group-0 road (group 0 corresponds to cluster C2 in the previous article)… And that’s comforting! It seems that “untrustworthy” users do not contribute to roads or -more probably- that their contributions are quickly amended.
  • Other contributions are made by intermediate users: a finer analysis should be undertaken to decide if the corresponding elements are valid. For now, we can consider everything is OK, even if local patterns seem strong. Areas of interest should be verified (they are not necessarily of low quality!)

For sure, it gives a fairly new picture of OSM data quality!

3 Conclusion

In this last article, we have designed new maps on a small area, starting from element metadata. You have seen the conclusion of our analysis: characterizing the OSM data quality starting from the user contribution history.

Of course some works still have to be done, however we detailed a whole methodology to tackle the problem. We hope you will be able to reproduce it, and to design your own maps!

Feel free to contact us if you are interested in this topic!

Pointclouds in PostgreSQL with Foreign Data Wrappers

0
0

IGN and Oslandia have been collaborating on a research project named LI3DS. LI3DS stands for “Large Input 3D System”. The project involves acquiring data on the field, such as images and point clouds, and providing tools for storing, processing and analyzing the data. Everything developed as part of the project is opensource, and available on GitHub: https://github.com/LI3DS. We will provide more information about the LI3DS project with a future post.

li3ds logo

LI³DS logo

This blog post is about fdw-li3ds, a library we’ve been working as part of LI3DS. fdw stands for Foreign Data Wrapper (FDW), which you may know if you’re a PostgreSQL user. PostgreSQL’s FDWs provide a way to access remote data and interact with that data through SQL as if it was stored in local tables. fdw-li3ds provides FDWs for pointcloud data. At time of this writing fdw-li3ds supports three file formats: SBET, EchoPulse and Rosbag. Other pointcloud file formats will be supported in the future, based on our needs for LI3DS and other projects. Contributions are also very welcome, obviously 🙂

Using fdw-li3ds this is how you create a “foreign table” linked to a SBET file:

CREATE SERVER sbetserver FOREIGN DATA WRAPPER multicorn
OPTIONS (
    wrapper 'fdwli3ds.Sbet'
);

CREATE FOREIGN TABLE sbet_schema (
    schema text
)
SERVER sbetserver
OPTIONS (
    metadata 'true'
);

INSERT INTO pointcloud_formats (pcid, srid, schema)
SELECT 2, 4326, schema FROM sbet_schema;

CREATE FOREIGN TABLE sbet (
    points pcpatch(2)
)
SERVER sbetserver
OPTIONS (
    sources 'data/sbet/sbet.bin'
    , patch_size '100'
    , pcid '2'
);

Let’s review this step by step:

-- Create server
CREATE SERVER sbetserver FOREIGN DATA WRAPPER multicorn
OPTIONS (
    wrapper 'fdwli3ds.Sbet'
);

Before creating a “foreign table” we need to create a “foreign server”. Here we create a server named sbetserver based on the “multicorn” FDW and the fdwli3ds.Sbet wrapper.

Multicorn is a PostgreSQL extension that makes it possible to define FDWs in Python (one of Oslandia’s favorite languages, among many others…). wrapper 'fdwli3ds.Sbet' in the options specifies that we want to use the fdwli3ds.Sbet Multicorn wrapper, which is the wrapper fdw-li3ds provides for reading SBET files. If fdw-li3ds supported LAS then fdwli3ds.Las would be used here.

-- Create metadata foreign table
CREATE FOREIGN TABLE sbet_schema (
    schema text
)
SERVER sbetserver
OPTIONS (
    metadata 'true'
);

This query creates a “foreign table” named sbet_schema which relies on the “sbetserver” server we created previously. metadata 'true' specifies that this foreign table contains the SBET metadata (as opposed to the SBET data).

-- insert SBET schema into pointcloud_formats
INSERT INTO pointcloud_formats (pcid, srid, schema)
SELECT 2, 4326, schema FROM sbet_schema;

This reads the SBET file’s schema (metadata) from the sbet_schema foreign table created previously, and insert that schema into the PostgreSQL Pointcloud extension’s pointcloud_formats table. Having a schema is required for creating, and working with, Pointcloud columns.

-- create foreign table linked to SBET file
CREATE FOREIGN TABLE sbet (
    points pcpatch(2)
)
SERVER sbetserver
OPTIONS (
    sources '/data/sbet/sbet.bin'
    , patch_size '100'
    , pcid '2'
);

The last query finishes up the process by creating the actual foreign table bound to the SBET file, /data/sbet/sbet.bin You can now query that table in the same way you’d query any other Postgres table. For example:

SELECT points FROM sbet;

The points column of the sbet table is of type pcpatch, which is one of data types defined by the PostgreSQL Pointcloud extension. QGIS knowing about the pcpatch data type, visualizing the content of the sbet table in QGIS is straightforward.

Also, for better query performance, a materialized view of the sbet table can be created, with, for example, an index on the pcpatch column:

CREATE MATERIALIZED VIEW sbet_view AS select points, row_number() over () as id from sbet;
CREATE UNIQUE INDEX ON sbet_view (id);
CREATE INDEX ON sbet_view USING GIST(PC_EnvelopeGeometry(points))

You can now go ahead and create a QGIS layer based on sbet_view. The experience should be much better than relying on the foreign table.

SBET file in QGIS

Example of a SBET file displayed in QGIS

As a quick conclusion we think that using Foreign Data Wrappers referencing pointcloud files provides for an interesting approach. By keeping the pointcloud data into files data duplication can be avoided. And using fdw-li3ds you can still provide an SQL access to that data, making it possible to query and analyze the data through SQL. With the PostgreSQL and PostGIS arsenal at your disposal!

Feel free to contact us if you have questions, or if you want to know more about what we’re doing with point clouds!

iTowns 2 : 3D Geospatial information on the web !

0
0

Oslandia, French IGN, LIRIS CNRS Research Laboratory and Atol CD announce the release of iTowns 2, the new version of the OpenSource 3D geospatial data visualization framework for the web. Rewritten from scratch, this new major version allows users to benefit from a solid technological basis, and new 3D visualization features. Its capabilities for software integration and interoperability, as well as its feature range let you build 3D geospatial data visualization applications simply.

New features

This version 2, based on the THREE.JS library, allows for rapid development and integration into existing web applications. iTowns also benefits from the large user and developers community of THREE.JS.

Among the features of version 2.1 :

  • Heterogeneous data support (DEM, imagery, point clouds, meshes …)
  • OGC 3D Tiles protocol integration
  • Globe mode or projected mode in your local coordinates system
  • GeoJSON, KMP, GPX Formats
  • OGC webservices support (WMTS, WFS..)
  • Compatible with Potree format for large-scale point clouds
  • post-processing capabilities

Open and collaborative project

iTowns, whose version 1.0 had been developed by IGN research services, is now an open and collaborative project. Oslandia and AtolCD are the main industrial contributors, and actively collaborate on the framework and are involved in projects integrating iTowns as a 3D visualization solution.

MATIS, LOEMI and COGIT laboratories at IGN contribute to the transfer of research work to iTowns, including new algorithms and methods. The IGN Geoportal team is also active and involved in iTowns development.

The CNRS LIRIS laboratory in Lyon, one of Oslandia’s partners, uses iTowns as a common software basis for its research and development efforts. Using an open software allows for faster development, better capitalization of research work and ease of collaboration. The Vilo3D platform from LIRIS is an example of application for geo-historical data visualization with story-telling capabilities. This platform is based on iTowns and currently under finalization.

The project is hosted on GitHub and open to all contributors wanting to take part in the development.

A mature solution for industrial projects

iTowns is now mature enough to be used for industrial software development. Oslandia is currently building visualization applications for large scale PointCloud data for its clients, virtual visit applications or 3D data catalogs. All based on iTowns for 3D visualization.

The project’s roadmap wishes to provide solutions for new use cases in 3D data exploitation : immersive visualization, large volumes of data, precise measurements, augmented reality and any other feature wanted or funded by its users.

Try it out !

Download iTowns and try the demos on the project’s website : http://www.itowns-project.org

Should you need more information or help, contact infos+itowns@oslandia.com .

Oslandia provides a wide range of services around iTowns, OpenSource GIS and 3D geospatial data. Get in touch !

Oslandia is baking some awesome QGIS 3 new features

0
0

QGIS 3.0 is now getting closer and closer, it’s the right moment to write about some major refactor and new features we have been baking at Oslandia.

A quick word about the release calendar, you probably felt like QGIS 3 freeze was expected for the end of August, didn’t you?

In fact, we have so many new major changes in the queue that the steering committee (PSC), advised by the core developers, decided to push twice the release date up up to the 27 of October. Release date has not be been pushed (yet).

At Oslandia we got involved in a dark list of hidden features of QGIS3.

They mostly aren’t easy to advertised visually, but you’ll appreciate them for sure!

  • Add  capabilities to store data in the project
    • add a new .qgz zipped file format container
    • have editable joins, with upsert capabilities (Insert Or Update)
    • Transparently store  and maintain in sync data in a sqlite database. Now custom labeling is pretty easy!
  • Coordinating work and tests on new node tool for data editing
  • Improving Z / m handling in edit tools and layer creation dialogs
  • Ticket reviewing and cleaning

Next articles will describe some of those tasks soon.

This work was a great opportunity to ramp up a new talented developer with commit rights on the repository! Welcome and congratulations to Paul our new core committer !

All this was possible with the support of many actors, but also thanks to the fundings of QGIS.org via Grant Applications or direct funding of QGIS server!

A last word, please help us in testing QGIS3, it’s the perfect moment to stress it, bugfix period is about to start !

 

 

 

Refresh your maps FROM postgreSQL !

0
0

Continuing our love story with PostgreSQL and QGIS, we asked QGIS.org a grant application during early 2017 spring.

The idea was to take benefit of very advanced PostgreSQL features, that probably never were used in a Desktop GIS client before.

Today, let’s see what we can do with the PostgreSQL NOTIFY feature!

Ever dreamt of being able to trigger things from outside QGIS? Ever wanted a magic stick to trigger actions in some clients from a database action?

X All The Y Meme | REFRESH QGIS FROM THE DATABASE !!! | image tagged in memes,x all the y | made w/ Imgflip meme maker

 

NOTIFY is a PostgreSQL specific feature allowing to generate notifications on a channel and optionally send a message — a payload in PG’s dialect .

In short, from within a transaction, we can raise a signal in a PostgreSQL queue and listen to it from a client.

In action

We hardcoded a channel named “qgis” and made QGIS able to LISTEN to NOTIFY events and transform them into Qt’s signals. The signals are connected to layer refresh when you switch on this rendering option.

Optionnally, adding a message filter will only redraw the layer for some specific events.

This mechanism is really versatile and we now can imagine many possibilities, maybe like trigger a notification message to your users from the database, interact with plugins, or even code a chat between users of the same database  (ok, this is stupid) !

 

More than just refresh layers?

The first implementation we chose was to trigger a layer refresh because we believe this is a good way for users to discover this new feature.

But QGIS rocks hey, doing crazy things for limited uses is not the way.

Thanks to feedback on the Pull Request, we added the possibility to trigger layer actions on notification.

That should be pretty versatile since you can do almost anything with those actions now.

Caveats

QGIS will open a permanent connection to PostgreSQL to watch the notify signals. Please keep that in mind if you have several clients and a limited number of connections.

Notify signals are only transmitted with the transaction, so when the COMMIT is raised. So be aware that this might not help you if users are inside an edit session.

QGIS has a lot of different caches, for attribute table for instance. We currently have no specific way to invalidate a specific cache, and then order QGIS to refresh it’s attribute table.

There is no way in PG to list all channels of a database session, that’s why we couldn’t propose a combobox list of available signals in the renderer option dialog. Anyway, to avoid too many issues, we decided to hardcode the channel name in QGIS with the name “qgis”. If this is somehow not enough for your needs, please contact us!

Conclusion

The github pull request is here : https://github.com/qgis/QGIS/pull/5179

We are convinced this would be really useful for real time application, let us know if that makes some bells ring on your side!

More to come soon, stay tuned!

 

 

Undo Redo stack is back QGIS Transaction groups

0
0

Let’s keep on looking at what we did in QGIS.org grant application of early 2017 spring.

At Oslandia, we use a lot the transaction groups option of QGIS. It was an experimental feature in QGIS 2.X allowing to open only one common Postgres transaction for all layers sharing the same connection string.

Transaction group option

When activated, that option will bring many killer features:

  • Users can switch all the layers in edit mode at once. A real time saver.
  • Every INSERT, UPDATE or DELETE is forwarded immediately to the database, which is nice for:
    • Evaluating on the fly if database constraints are satisfied or not. Without transaction groups this is only done when saving the edits and this can be frustrating to create dozens of features and having one of them rejected because of a foreign key constraint…
    • Having triggers evaluated on the fly.  QGIS is so powerful when dealing with “thick database” concepts that I would never go back to a pure GIS ignoring how powerful databases can be !
    • Playing with QgsTransaction.ExecuteSQL allows to trigger stored procedures in PostgreSQL in a beautiful API style interface. Something like
SELECT invert_pipe_direction('pipe1');
  • However, the implementation was flagged “experimental” because some caveats where still causing issues:
    • Committing on the fly was breaking the logic of the undo/redo stack. So there was no way to do a local edit. No Ctrl+Z!  The only way to rollback was to stop the edit session and loose all the work. Ouch.. Bad!
    • Playing with ExecuteSQL did not dirty the QGIS edit buffer. So, if during an edit session no edit action was made using QGIS native tools, there was no clean way to activate the “save edits” icon.
    • When having some failures in the triggers, QGIS may loose DB connection and thus create a silent ROLLBACK.

We decided to try to restore the undo/redo stack by saving the history edits in PostgreSQL SAVEPOINTS and see if we could restore the original feature in QGIS.

And.. it worked!

Let’s see that in action:

 

Potential caveats ?

At start, we worried about how heavy all those savepoints would be for the database. It turns out that maybe for really massive geometries, and heavy editing sessions, this could start to weight a bit, but honestly far away from PostgreSQL capabilities.

 

Up to now, we didn’t really find any issue with that..

And we didn’t address the silent ROLLBACK that occurs sometimes, because it is generated by buggy stored procedures, easy to solve.

Some new ideas came to us when working in that area. For instance, if a transaction locks a feature, QGIS just… wait for the lock to be released. I think we should find a way to advertise those locks to the users, that would be great! If you’re interested in making that happen, please contact us.

 

More to come soon, stay tuned!

 

 

Auxiliary Storage support in QGIS 3

0
0

For those who know how powerful QGIS can be using data defined widgets and expressions almost anywhere in styling and labeling settings, it remains today quite complex to store custom data.

For instance, moving a simple label using the label toolbar is not straightforward, that wonderful toolbar remains desperately greyed-out for manual labeling tweaks

…unless you do the following:

  • Set your vector layer editable (yes, it’s not possible with readonly data)
  • Add two columns in your data
  • Link the X property position to a column and the Y position to another

 

the Move Label map tool becomes available and ready to be used (while your layer is editable). Then, if you move a label, the underlying data is modified to store the position. But what happened if you want to fully use the Change Label map tool (color, size, style, and so on)?

 

Well… You just have to add a new column for each property you want to manage. No need to tell you that it’s not very convenient to use or even impossible when your data administrator has set your data in readonly mode…

A plugin, made some years ago named EasyCustomLabeling was made to address that issue. But it kept being full of caveats, like a dependency to another plugin (Memory layer saver) for persistence, or a full copy of the layer to label inside a memory layer which indeed led to loose synchronisation with the source layer.

Two years ago, the French Agence de l’eau Adour Garonne (a water basin agency) and the Ministry in charge of Ecology asked Oslandia to think out QGIS Enhancement proposals to port that plugin into QGIS core, among a few other things like labeling connectors or curved labels enhancements.

Those QEPs were accepted and we could work on the real implementation, so here we are, Auxiliary storage has now landed in master!

How

The aim of auxiliary storage is to propose a more integrated solution to manage these data defined properties :

  • Easy to use (one click)
  • Transparent for the user (map tools always available by default when labeling is activated)
  • Do not update the underlying data (it should work even when the layer is not editable)
  • Keep in sync with the datasource (as much as possible)
  • Store this data along or inside the project file

As said above, thanks to the Auxiliary Storage mechanism, map tools like Move Label, Rotate Label or Change Label are available by default. Then, when the user select the map tool to move a label and click for the first time on the map, a simple question is asked allowing to select a primary key :

Primary key choice dialog – (YES, you NEED a primary key for any data management)

From that moment on, a hidden table is transparently created to store all data defined values (positions, rotations, …) and joined to the original layer thanks to the primary key previously selected. When you move a label, the corresponding property is automatically created in the auxiliary layer. This way, the original data is not modified but only the joined auxiliary layer!

A new tab has been added in vector layer properties to manage the Auxiliary Storage mechanism. You can retrieve, clean up, export or create new properties from there :

Where the auxiliary data is really saved between projects?

We end up in using a light SQLite database which, by default, is just 8 Ko! When you save your project with the usual extension .qgs, the SQLite database is saved at the same location but with a different extension : .qgd.

Two thoughts with that choice: 

  • “Hey, I would like to store geometries, why no spatialite instead? “

Good point. We tried that at start in fact. But spatialite database initializing process using QGIS spatialite provider was found too long, really long. And a raw spatialite table weight about 4 Mo, because of the huge spatial reference system table, the numerous spatial functions and metadata tables. We chose to fall back onto using sqlite through OGR provider and it proved to be fast and stable enough. If some day, we achieve in merging spatialite provider and GDAL-OGR spatialite provider, with options to only create necessary SRS and functions, that would open news possibilities, like storing spatial auxiliary data.

  • “Does that mean that when you want to move/share a QGIS project, you have to manually manage these 2 files to keep them in the same location?!”

True, and dangerous isn’t it? Users often forgot auxiliary files with EasyCustomLabeling plugin.  Hence, we created a new format allowing to zip several files : .qgz.  Using that format, the SQLite database project.qgd and the regular project.qgs file will be embedded in a single project.zip file. WIN!!

Changing the project file format so that it can embed, data, fonts, svg was a long standing feature. So now we have a format available for self hosted QGIS project. Plugins like offline editing, Qconsolidate and other similar that aim at making it easy to export a portable GIS database could take profit of that new storage container.

Now, some work remains to add labeling connectors capabilities,  allow user to draw labeling paths by hand. If you’re interested in making this happen, please contact us!

 

 

More information

A full video showing auxiliary storage capabilities:

 

QEP: https://github.com/qgis/QGIS-Enhancement-Proposals/issues/27

PR New Zip format: https://github.com/qgis/QGIS/pull/4845

PR Editable Joined layers: https://github.com/qgis/QGIS/pull/4913

PR Auxiliary Storage: https://github.com/qgis/QGIS/pull/5086


Detecting objects starting from street-scene images

0
0

Exploiting artificial intelligence within the geospatial data context tends to be easier and easier thanks to emerging deep learning techniques. Neural networks take indeed various kinds of designs, and cope with a wide range of applications.

At Oslandia we bet that these techniques will have an added-value in our daily activity, as data is of first importance for us. This article will show you an example of how we use AI techniques along with geospatial data.

Exploit an open dataset in relation to street scenes

In this article we use a set of 25,000 images provided by Mapillary, in order to investigate on the presence of some typical street-scene objects (vehicles, roads, pedestrians…). Mapillary released this dataset recently, it is still available on its website and may be downloaded freely for a research purpose.

As inputs, Mapillary provides a bunch of street scene images of various sizes in a images repository, and the same images after filtering process in instances and labels repositories. The latter is crucial, as the filtered images are actually composed of pixels in a reduced set of colors. Actually, there is one color per object types; and 66 object types in total. Some minor operations on the filtered image pixels can give outputs as one-hot vectors (i.e. a vector of 0 and 1, 1 if the corresponding label is on the image, 0 otherwise).

Figure 1: Example of image, with its filtered version

As a remark, neural networks consider equally-sized inputs, which is not the case of Mapillary images. A first approximation could be to resize every images as the most encountered size (2448*3264), however we choose to resize them at a smaller size (576*768) for computation purpose.

Implement a convolutional neural network with TensorFlow

Our goal here is to predict the presence of differents street-scene components on pictures. We aim to train a neural network model to make it able to detect if there is car(s), truck(s) or bicycle(s) for instance on images.

As Mapillary provided a set of 66 labels and a labelled version of each dataset image, we plan to investigate a multilabel classification problem, where the final network layer must evaluate if there is an occurrence of each label on any image.

Neural network global structure

Handling image within neural network is generally done with the help of convolutional neural networks. They are composed of several kinds of layers that must be described:

  • convolutional layers, in which images are filtered by several learnable image kernels, so as to extract image patterns based on pixels (this layer type is of first importance in convolutional neural network);
  • pooling layers, in order to reduce the size of images and converge towards the output layer, as well as to extract feature rough locations (the max pooling operation is the most common one, i.e. consider the maximal value over a local set of pixels);
  • fully-connected layers, where every neurons of the current layer are connected to every neurons of the previous layer.

Typical_cnn.png

Figure 2: Convolutional neural network illustration (cf Wikipedia)

We’ve carried out a set of tests with different hyperparameter values, i.e. different amounts of each layer kinds. The results are globally stable if we consider more than one convolutional layer. Here comes the way to define a neural network with TensorFlow, the dedicated Python library.

How to define data

Inputs and outputs are defined as “placeholders”, aka a sort of variables that must be feed by real data.

X = tf.placeholder(tf.float32, [None, 576, 768, 3], name='X')
Y = tf.placeholder(tf.float32, [None, 66], name='Y')

How to define a convolutional layer

After designing the kernel and the biases, we can use the TensorFlow function conv2d to build this layer.

kernel1 = tf.get_variable('kernel1', [8, 8, 3, 16], initializer=tf.truncated_normal_initializer())
biases1 = tf.get_variable('biases1', [16], initializer=tf.constant_initializer(0.0))
# Apply the image convolution with a ReLu activation function
conv_layer1 = tf.nn.relu(tf.add(tf.nn.conv2d(X, kernel1, strides=[1, 1, 1, 1], padding="SAME"),
                                biases1))

In this example, the kernel are 16 squares of 8*8 pixels considering 3 colors (RGB channels).

How to define a max-pooling layer

As for convolutional layer, there is a ready-to-use function in the TensorFlow API, i.e. max_pool.

pool_layer1 = tf.nn.max_pool(conv_layer1, ksize=[1, 4, 4, 1],
                             strides=[1, 4, 4, 1], padding='SAME')

This function takes the maximal pixel value for each block of 4*4 pixels, in every filtered images. The out-of-the-border pixels are set as the border pixels, if a block definition needs such additional information. The number of pixels is divided by 16 after such an operation.

How to define a fully-connected layer

This operation corresponds to a standard matrix multiplication; we just have to reshape the output of the previous layer so as to consider comparable structures. Let’s imagine we have added a second convolutional layer as well as second max-pooling layer, the full-connected layer definition is as follows:

reshaped = tf.reshape(pool_layer2, [-1, int((576/(4*4))*(768/(4*4))*24)])
# Create weights and biases
weights_fc = tf.get_variable('weights_fullconn', [int((576/(4*4))*(768/(4*4))*24), 1024],
                    initializer=tf.truncated_normal_initializer())
biases_fc = tf.get_variable('biases_fullconn', [1024],
                    initializer=tf.constant_initializer(0.0))
# Apply relu on matmul of reshaped and w + b
fc = tf.nn.relu(tf.add(tf.matmul(reshaped, weights_fc), biases_fc), name='relu')
# Apply dropout
fc_layer = tf.nn.dropout(fc, 0.75, name='relu_with_dropout')

Here we have defined the major part of our network. However the output layer is still missing…

Build predicted labels

The predicted labels are given after a sigmoid activation in the last layer: the sigmoid function allows to consider independant probabilities in multilabel context, i.e. if the presence of different object types on images is possible.

The sigmoid function gives probabilities of appearance for each object type, in a given picture. The predicted labels are built as simply as possible: a threshold of 0.5 is set to differentiate negative and positive predictions.

# Create weights and biases for the final fully-connected layer
weights_sig = tf.get_variable('weights_s', [1024, 66],
                              initializer=tf.truncated_normal_initializer())
biases_sig = tf.get_variable('biases_s', [66], initializer=tf.random_normal_initializer())
logits = tf.add(tf.matmul(fc_layer, weights_sig), biases_sig)
Y_raw_predict = tf.nn.sigmoid(logits)
Y_predict = tf.to_int32(tf.round(Y_raw_predict))

Optimize the network

Although several metrics may measure the model convergence, we choose to consider classic cross-entropy between true and predicted labels.

entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logits)
loss = tf.reduce_mean(entropy, name="loss")
optimizer = tf.train.AdamOptimizer(0.01).minimize(loss)

In this snippet, we are using AdamOptimizer, however other solutions do exist (e.g. GradientDescentOptimizer).

Assess the model quality

Several way of measuring the model quality may be computed, see e.g.:

  • accuracy (number of good predictions, over total number of predictions)
  • precision (number of true positives over all positive predictions)
  • recall (number of true positives over all real positive values)

They can be computed globally, or by label, as we are in a multilabel classification problem.

Train the model

Last but not least, we have to train the model we have defined. That’s a bit complicated because of batching operations, for a sake of clarity here we suppose that our training data are correctly batched and we loop over 100 iterations only, to keep the training short (that’s just for demo, prefer considering all your data -at least- once!).

from sklearn.metrics import accuracy_score

def unnest(l):
    return [index for sublist in l for index in sublist]

sess = tf.Session()
# Initialize the TensorFlow variables
sess.run(tf.global_variables_initializer())
# Train the model (over 900 batchs of 20 images, i.e. 18000 training images)
for index in range(900):
    X_batch, Y_batch = sess.run([train_image_batch, train_label_batch])
    sess.run(optimizer, feed_dict={X: X_batch, Y: Y_batch})
    if index % 10 == 0:
        Y_pred, loss_batch = sess.run([Y_predict, loss], feed_dict={X: X_batch, Y: Y_batch})
        accuracy_batch = accuracy_score(unnest(Y_batch), unnest(Y_pred))
        print("""Step {}: loss = {:5.3f}, accuracy={:1.3f}""".format(index, loss_batch, accuracy_batch))

What kind of objects are on a test image ?

In order to illustrate the previous developments, we can test our network on a new image, i.e. an image that does not have been scanned during model training.

Figure 3: Example of image used to validate the model

The neural network is supplied with this image and the corresponding true labels, to compute predicted labels:

Y_pred = sess.run([Y_predict], feed_dict={X: x_test, Y: y_test})
sess.close()

The model accuracy for this image is around 74,2% ((34+15)/66), which is quite good. However it may certainly be improved as the model has seen the training images only once…

              Y_pred=False  Y_pred=True
y_test=False            34            9
y_test=True              8           15

We can extract the more interesting label category, aka the true positives corresponding to object on the image detected by the model:

0              curb
1              road
2          sidewalk
3          building
4            person
5           general
6               sky
7        vegetation
8         billboard
9      street-light
10             pole
11     utility-pole
12    traffic-light
13            truck
14        unlabeled
dtype: object

To understand the category taxonomy, interested readers may read the dedicated paper available on Mapillary website.

How to go further?

In this post we’ve just considered a feature detection problem, so as to decide if an object type t is really on an image p, or not. The natural prolongation of that is the semantic segmentation, i.e. knowing which pixel(s) of p have to be labeled as part of an object of type t.

This is the way Mapillary labelled the pictures; it is without any doubt a really promising research field for some use cases related to geospatial data!

To go deeper into this analysis, you can find our code on Github.

If you want to collaborate with us and be a R&D partner on such a topic, do not hesitate to contact us at infos@oslandia.com!

Cluster bike sharing stations around french cities

0
0

Oslandia team is involved in a constant effort in geospatial data gathering and analysis. By taking advantage of the recent trend in public open data releasing, and after reading an inspiring work done for Dublin data, we decided to evaluate the situation in our own living places. Here comes the open data portals for both cities: Lyon and Bordeaux. The Python code and the notebooks for our open data bike analysis are available on Github.

What is the data looking like?

Both data set have been gathered with a cron plugged onto open data portals between the 8th of July and the 26th of September, i.e. more than 11 weeks of data.

It is important to consider there is no real standardization related to bike sharing open data sets: each city provides its own data set with its own features.

Around Lyon the data set is as follows:

                                   6338022
number                                2023
last_update            2017-09-14 16:48:20
bike_stands                             20
available_bike_stands                   17
available_bikes                          3
availabilitycode                         1
availability                          Vert
bonus                                  Non
status                                OPEN

Whereas for Bordeaux, a record is like:

                             2127658
gid                              157
ident                            161
type                            VLS+
name               Le Taillan Mairie
state                      CONNECTEE
available_stand                    6
available_bike                     7
ts               2017-08-24 22:09:03

However to investigate on bike availability patterns and provide clusters, we only need station ids, timestamps and number of available bikes!

Are there identifiable types of bike sharing stations?

In order to identify groups of similar bike sharing stations in each of these cities, we have run a simple clustering approach on availability data. To keep the data set as simple as possible and work on comparable situations, we have aggregated the measurement to one-hour periods, and dropped the data gathered during week-end days. We consequently have dataframes of N rows and 24 columns (a column being the bike percentage of availability at every station, for a given hour of the day).

We have worked with four clusters, considering elbow method as well as cluster significance.

As shown in the figures below, the global pattern seems comparable between both cities:

  • a first cluster (red) groups stations where bikes have high availability rates during the night. They probably refer to residential neighborhoods;
  • a second cluster (green in Bordeaux, blue in Lyon) gathers stations near diurnal activities (job places, universities and so on…);
  • a third cluster (blue in Bordeaux, green in Lyon) refers to stations where there are almost always a high proportion of available bikes. They are either in less visited places or other transportation modes do the job;
  • a fourth and last group (purple) clusters stations where there are a lot of bikes during evenings, and fairly less during the remaining of the day, showing lively night-time neighborhoods.

  Figure 1: Bike availability patterns in (a) Bordeaux (b) Lyon

Understanding a city from its bike sharing stations

That can be considered quite good for a beginning, however what about the geographical part of the data? Folium (see doc here), a Python library based on Leaflet.js, will allow us to draw interactive maps with our newly designed clusters.

  Figure 2: Clustered bike sharing in Bordeaux

The situation of both example cities is quite diverging here, as illustrated by the previous maps. On the first hand, we can see that the Bordeaux spatial repartition is not so clear following the clustering process. Either the cluster process gives insignificant results, or the shared bike usage does not fit the neighborhood division.

Figure 3: Clustered bike sharing in Lyon

On the second hand, results in Lyon are far more significant, from a geospatial point of view. Residential stations are mainly in Villeurbanne (east of Lyon) and in the 8th district (south east). We get a high concentration of blue stations (diurnal activities) in the 3rd district (Part-Dieu business area), or near La Doua campus (north). It is interesting to note the blue points in the far south east of the city: they correspond to the largest hospital in Lyon. If we consider purple stations (evening activities), almost all the points are located in the city center (Croix-Rousse, Lyon peninsula and riverbanks nearby), in partying places.

(… With all these elements, the true question is: where is Oslandia’s office in Lyon?)

 

In a next article, we will provide an extension of this work by predicting bike availability at stations.

If you are interested in continuing the discussion with us on this matter, or on another data-related topic, do not hesitate to mail us (infos+data@oslandia.com) or to explore our Github project! If you are interesting in reading about another clustering application, you can also find a previous OpenStreetMap-related work done with KMeans on our blog (see Github project here).

QGIS 3 compiling on Windows

0
0

As the Oslandia team work exclusively on GNU/Linux, the exercise of compiling QGIS 3 on Windows 8 is not an everyday’s task :). So we decided to share our experience, we bet that will help some of you.

Cygwin

The first step is to download Cygwin and to install it in the directory C:\cygwin (instead of the default C:\cygwin64). During the installation, select the lynx package:

 

Once installed, you have to click on the Cygwin64 Terminal icon newly created on your desktop:

Then, we’re able to install dependencies and download some other installers:

$ cd /cygdrive/c/Users/henri/Downloads
$ lynx -source rawgit.com/transcode-open/apt-cyg/master/apt-cyg > apt-cyg
$ install apt-cyg /bin
$ apt-cyg install wget git flex bison
$ wget http://download.microsoft.com/download/D/2/3/D23F4D0F-BA2D-4600-8725-6CCECEA05196/vs_community_ENU.exe
$ chmod u+x vs_community_ENU.exe
$ wget https://cmake.org/files/v3.7/cmake-3.7.2-win64-x64.msi
$ wget http://download.osgeo.org/osgeo4w/osgeo4w-setup-x86_64.exe
$ chmod u+x osgeo4w-setup-x86_64.exe

CMake

The next step is to install CMake. To do that, double clic on the file cmake-3.7.2-win64-x64.msi previously downloaded with wget. You should choose the next options during the installation:

 

Visual Studio

Then, we have to install Visual Studio and C++ tools. Double click on the vs_community_ENU.exe file and select the Custom installation. On the next page, you have to select Visual C++ chekbox:

 

 

OSGeo4W

In order to compile QGIS, some dependencies provided by the OSGeo4W installer are required. Double click on osgeo4w-setup-x86_64.exe and select the Advanced Install mode. Then, select the next packages:

  •  expat
  • fcgi
  • gdal
  • grass
  • gsl-devel
  • iconv
  • libzip-devel
  • libspatialindex-devel
  • pyqt5
  • python3-devel
  • python3-qscintilla
  • python3-nose2
  • python3-future
  • python3-pyyaml
  • python3-mock
  • python3-six
  • qca-qt5-devel
  • qca-qt5-libs
  • qscintilla-qt5
  • qt5-devel
  • qt5-libs-debug
  • qtwebkit-qt5-devel
  • qtwebkit-qt5-libs-debug
  • qwt-devel-qt5
  • sip-qt5
  • spatialite
  • oci
  • qtkeychain

QGIS

To start this last step, we have to create a file C:\OSGeo4W\OSGeo4W-dev.bat containing something like:

@echo off 
set OSGEO4W_ROOT=C:\OSGeo4W64
call "%OSGEO4W_ROOT%\bin\o4w_env.bat" 
call "%OSGEO4W_ROOT%\bin\qt5_env.bat" 
call "%OSGEO4W_ROOT%\bin\py3_env.bat" 
set VS140COMNTOOLS=%PROGRAMFILES(x86)%\Microsoft Visual Studio 14.0\Common7\Tools\ 
call "%PROGRAMFILES(x86)%\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64 
set INCLUDE=%INCLUDE%;%PROGRAMFILES(x86)%\Microsoft SDKs\Windows\v7.1A\include 
set LIB=%LIB%;%PROGRAMFILES(x86)%\Microsoft SDKs\Windows\v7.1A\lib 
path %PATH%;%PROGRAMFILES%\CMake\bin;c:\cygwin\bin 
@set GRASS_PREFIX="%OSGEO4W_ROOT%\apps\grass\grass-7.2.1 
@set INCLUDE=%INCLUDE%;%OSGEO4W_ROOT%\include 
@set LIB=%LIB%;%OSGEO4W_ROOT%\lib;%OSGEO4W_ROOT%\lib 

@cmd 

According to your environment, some variables should probably be adapted. Then in the Cygwin terminal:

$ cd C:\
$ git clone git://github.com/qgis/QGIS.git
$ ./OSGeo4W-dev.bat
> cd QGIS/ms-windows/osgeo4w

In this directory, you have to edit the file package-nightly.cmd to replace:

cmake -G Ninja ^

by:

cmake -G "Visual Studio 14 2015 Win64" ^

Moreover, we had to update the environment variable SETUAPI_LIBRARY according to the current position of the Windows Kits file SetupAPI.Lib:

set SETUPAPI_LIBRARY=C:\Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x64\SetupAPI.Lib

And finally, we just have to compile with the next command:

> package-nightly.cmd 2.99.0 1 qgis-dev x86_64

Victory!

And see you soon for the generation of OSGEO4W packages 😉

Source

https://github.com/qgis/QGIS/blob/ab859c9bdf8a529df9805ff54e7250921a74d877/doc/msvc.t2t

 

 

OSM data classification: code release

0
0

After a set of blog posts published the last summer, we are glad to announce that the dedicated code (version 1.0) has been released on Github.

Our OpenStreetMap history data pipeline will let you analyze user contributions through time, following several modalities:

  • evaluate the area evolution in terms of nodes, ways and relations;
  • investigate on OSM tag genome, i.e. the whole set of tag keys and values that are used in OSM;
  • classify the OSM users starting from their contribution, with the help of unsupervised learning techniques (Principle Component Analysis and KMeans).

How to use the code?

First of all you can clone the Github repository:
git clone https://github.com/Oslandia/osm-data-classification.git .

We developped the code with Python 3, and based our Python data pipeline on Luigi library. Hence the command must explicitely refer to a Luigi operation. The command structure must be as follows:
python3 -m luigi --local-scheduler --module <--args>
Or alternatively:
luigi --local-scheduler --module <--args>
if Luigi is in your Path.

These commands must be run from the src directory. You can also run them elsewhere if you add the src directory to your PYTHONPATH variable.

The possible command arguments are of two sorts:

  • Luigi-focused command-line arguments (--local-scheduler, --module, --help);
  • Task-focused command-line arguments (in such a case, the list of arguments may be printed by running Luigi on the specific task with the --help argument)

Data gathering

At the beginning of the data pipeline we need an OSM history file (with the .osh.pbf extension). This kind of file can be downloaded for instance from the GeoFabrik website.

Let suppose that we are at the project root, we can get a small dataset as an example:
wget http://download.geofabrik.de/europe/isle-of-man.osh.pbf ./data/raw
We store it into the ./data/raw directory. Be careful to the path thing… The default argument for data repository is --datarep="data". Regarding the place where you run the code from, you potentially will have to change it.

Result production

Some Luigi tasks (especially the KMeans-related ones) produce some extra CSV files during the analysis. They will be stored into datarep/output-extracts/.

 

Some additional comments are in the README of the project on Github. If you have questions when using the project, please contact us by email (infos+data@oslandia.com) or directly with Github issue system. If you want to add some new features to the code, do not hesitate to contribute!

Itowns v2.2 released!

0
0

 

We have the pleasure of announcing the release of iTowns v2.2! Itowns is a Three.js-based framework written in Javascript/WebGL for visualizing 3D geospatial data right into the browser, developed and maintained by Oslandia, IGN and AtolCD.

In this release, you will find a lot of bug fixes and performance improvements. Notably, the npm package is now working 🙂 Some new demos have also been added.

Useful features have been added too! Including:

  • The ability to change the opacity and color of a PointCloud
  • For PointCloud layers, you can now hook your customization code on a onPointsCreated callback
  • Panorama display support (drag to rotate the camera!):
  • A new provider – the static provider – has been added. It allows to serve tiles directly from the filesystem (from a metadata file in JSON containing a list of images URL and associated extent) without having to install a full WMS or WMTS server. It is very useful in development phase, or for small amount of tiles.
  • The ability to filter features from a WFS layer
  • Extrusion support for WFS layers
  • It is now possible to override the material of generated meshes from a 3dtile layer
  • Opacity is now supported for vector layers (3dTiles and WFS)
  • Touch capable display device are now supported in FirstPersonControls, PanoramicControls and FlyControls
  • Early collision detection support is now supported in some controls
  • And last but not least, IE11 support has been added

Please download it on the project release page and don’t forget to have a look at the Breaking Changes section.

Contributions are welcome, either by reporting bugs or opening PR on our Github repository. For enquiries please email us to infos+itowns@oslandia.com or get in touch on IRC (#itowns on freenode).

Enjoy!

Best wishes for 2018

0
0

Happy new year 2018

The whole team at Oslandia sends you its best wishes for this new year, full of serenity and freedom.

2017 has been an important year for our company, with a change in ownership and management, growth of the team, and new internal organization. This allows us to foresee next years with confidence and enthousiasm.

Our new website now showcases the three sectors composing our activities :

  • The historical GIS topic, with QGIS, PostGIS and custom applications such as QWAT or a project for cadaster in Burundi
  • The 3D domain, for visualization ( iTowns ) and GIS-simulation coupling, e.g. Geology with Albion, hydrology with Hydra
  • Our new DATA activity : data processing and applying AI to GIS, like we did for OSM data quality assessment

Lots of new actions and projects that we are impatient to share with you in 2018 !

Vincent Picavet, President at Oslandia, in the name of the team

Predict bike availability at bike sharing stations

0
0

In a previous article, we described the clustering of bike sharing stations in two french cities, i.e. Bordeaux and Lyon. We saw that geospatial clustering is interesting to understand city organization. Results were especially impressive for the Lyon data set.

Here we propose to continue the effort about bike sharing system description by attempting to predict the bike availability rates for each station in Lyon.

Collecting the data

As in the previous article we can use almost two months and a half of recordings at every bike sharing stations in Lyon, between 2017/07/08 and 2017/09/26.

By simplifying our data, we can see that we get timestamped bike availability data at each station.

         station                  ts  stands  bikes bonus
5561337     2039 2017-09-06 22:13:17       2     15   Non
7140294     3088 2017-09-22 18:46:50      13      3   Non
1835180     1036 2017-07-28 22:57:23      14      3   Oui
1980966    10061 2017-07-30 16:36:08       7     11   Non
2196562     8009 2017-08-02 08:17:11       2     17   Non

We preprocess a little bit more these data so as to extract timestamp features and the probability to find a bike at each period of the day. Here data is resampled to 10-minute periods, and the amount of available bikes is averaged regarding records gathered during each period.

   station                  ts  bikes  stands  day  hour  minute  probability
0     1001 2017-07-09 00:00:00   15.0     1.0    6     0       0      0.93750
1     1001 2017-07-09 00:10:00   15.0     1.0    6     0       0      0.93750
2     1001 2017-07-09 00:20:00   14.5     1.5    6     0      10      0.90625
3     1001 2017-07-09 00:30:00   14.5     1.5    6     0      20      0.90625
4     1001 2017-07-09 00:40:00   11.5     4.5    6     0      30      0.71875

This is the final data set that we will give to the predictive model. To improve the quality of predictions, other types of data could be integrated to this framework, e.g. weather forecasts (however we let it for a further study).

Predicting shared bike availability

The first practical question to answer here is the prediction horizon. Here we will attempt to predict the bike availability after 30 minutes. It could be a typical problem for a user who wants to plan a local trip: will he find a bike at his preferred station, or should he walk to the next one? Or maybe should he look for an alternative transportation mode, as there will be no bike in the neighborhood nearby?

Let’s use two weeks of data for training (i.e. from 2017/07/11 at 0:00 a.m. to 2016/07/26 at 10:00 a.m.) so as to predict bike availability on the network in the next hour (i.e. 2017/07/26 from 10:30 to 11:30 a.m.). The explicative variables will be the station id, the timestamp information (day id, hours, minutes) and the station-related features (numbers of available bikes and stands).

To do the hard prediction job, we use XGBoost (see doc here), a distributed gradient boosting method that can undertake classification as well as regression processes. Here, we are in the second case, as we want to  estimate the value of a quantitative variable (the probability of finding an available bike at a given station, at a given hour).

Like the AdaBoost model, XGBoost is a boosted tree model which involves a sequence of smaller models (decision trees) and where each submodel training error function depends on the previous model results. Boosting algorithms are amongst the most widely used algorithms in data science competitions.

XGBoost model example (from Tianqi Chen): two decision trees are used as submodels to answer the question “does the person like computer games?”

Our model learns quite fast, and after 25 iterations, the training process converges around a satisfying error value (RMSE around 0.095).

Figure 1: XGBoost training curves (RMSE)

Mapping the predictions

In the last article, we plotted the shared bike stations according to their clusters. Here, we are in a regression problem, we are more focused on the level of bike availability, seen as a percentage.

The following map shows such a quantity, for all stations in Lyon:

Figure 2: True shared bike availability, per station (red: empty station, blue: full station)

We have to compare it with the prediction provided by the XGBoost model below. With such a color scale, we can say that the prediction looks good:

 

Figure 3: Predictions on shared bike availability, per station (red: empty station, blue: full station)

If we focus on the prediction error, we may highlight bike stations where the model failed to give an accurate prediction:

Figure 4: Prediction error (RMSE), per station (red: less bike than ground truth, blue: more bike than ground truth)

The wrong predictions are sparsely located, with the exception of three stations on the west of the city. These points are on the Fourvière hill, a very hard place to ride with bikes! As if the model were really unconfident
regarding people ability to climb up to these stations…

You may find the code and some notebooks related to this topic on Github. We also thank Armand Gilles (@arm_gilles) for his contribution to the project, through his soon-merged fork.

 

If you want to discuss about that with us, or if you have some needs on similar problems, please contact us ( infos+data@oslandia.com)!


Database migrations and Pum

0
0

At Oslandia we often need to deal with database migrations. So it’s an important topic to us. And
it should be an important topic to anyone maintaining databases in production.

First of all I want to mention a good series of articles by K. Scott Allen on the philosophy and
practice of database migrations and version control:

These articles are not about specific databases and migration tools. Instead they cover the basics of having your databases under version control and managing migrations.

Many database migration tools exist, both in the opensource and proprietary worlds.

Some tools relate to specific programming languages and ORMs (Object-Relational Mapping), such as Rails Migrations and Alembic. I’ve personally used Alembic in a number of Python projects in the past. Alembic is a fantastic tool for Python applications based on the SQLAlchemy toolkit. I would hightly recommend it. Other popular database migration tools include Flyway and Liquidbase. With Flyway migrations are written in either Java or SQL. With Liquidbase migrations are described using declarative formats such as XML, YAML and JSON; “plain SQL” being also supported.

Pum, which stands for “Postgres Upgrades Manager”, is a new migration tool. It was created in the context of the QWAT and QGEP projects, but it is completely independent and agnostic to those projects. Pum was initially created by OPENGIS.ch, with help from Oslandia and the whole QWAT community for the design and specifications of the tool.

Pum was greatly inspired by Flyway and Liquidbase. Flyway and Liquidbase are written in Java, and supports various database systems. In contrast Pum is written in Python, and focuses on the PostgreSQL database system. Pum is simple, and easy to use and extend. Also, with Pum, migrations, called “deltas” in the Pum jargon, are written in “plain SQL”. Pum fully embraces SQL and doesn’t attempt to hide it behind a declarative or programming language.

Pum is a CLI (Command Line Interface) tool. The main commands provided by Pum are

  • check
  • baseline
  • upgrade
  • test-and-upgrade

The check command compares two databases and shows the differences. Let’s create a simple example to illustrate it:

$ createdb prod  # create the "production" database
$ createdb ref   # create the "reference" database
$
$ cat > pg_service.conf << EOF # create a PostgreSQL service file for "prod" and "ref" > [prod]
> dbname=prod
> [ref]
> dbname=ref
> EOF
$ export PGSERVICEFILEFILE=pg_service.conf
$
$ pum check -p1 prod -p2 ref
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

The check command reports that the databases “prod” and “ref” are identical. They’re actually both empty. Now let’s add a table to the “ref” database, and run the check command again:

$ # add the "test" table to the "ref" database
$ psql -d ref -c "create table test (name text)"
$
$ pum check -p1 prod -p2 ref
Check...DIFFERENCES FOUND
columns:
- + ('public', 'test', 'name', None, 'YES', 'text', None, None, None, None)
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables:
- + ('public', 'test')
triggers: []
views: []

This time the check command reports that the “ref” database has a table and a column that the “prod” database does not have. If we created the same “test” table in “prod” the `check` command would report that “prod” and “ref” are the same again.

The baseline command assigns a version to a database. For example let’s assign the version 0.0.1 to the “prod” database, and the version 0.0.2 to the “ref” database:

$ pum baseline -p prod -t public.pum_upgrades -d deltas -b 0.0.1
$ pum baseline -p ref -t public.pum_upgrades -d deltas -b 0.0.2
$
$ # check that version 0.0.1 was assigned to the "prod" database
$ psql service=prod -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |        installed_on        | execution_time | success 
----+---------+-------------+------+--------+----------+--------------+----------------------------+----------------+---------
  1 | 0.0.1   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:33:40.948143 |              1 | t
(1 row)

$
$ # check that version 0.0.2 was assigned to the "ref" database
$ psql service=ref -c "table pum_upgrades"
 id | version | description | type | script | checksum | installed_by |       installed_on        | execution_time | success
----+---------+-------------+------+--------+----------+--------------+---------------------------+----------------+---------
  1 | 0.0.2   | baseline    |    0 |        |          | postgres     | 2018-02-22 16:56:25.19542 |              1 | t
(1 row)

Finally, we’re going to write a migration script to migrate the “prod” database, and use Pum to do the migration.

We create the migration script delta_0.0.2_000.sql with the following content:

create table test (name text);

Let’s now migrate “prod”, checking that it now matches the “ref” database:

$ # a "test" database is required by the test-and-upgrade command
$ createdb test
$ cat >> pg_service.conf << EOF > [test]
> dbname=test
> EOF
$
$ # now perform the migration
$ pum test-and-upgrade -pp prod -pt test -pc ref -t public.pum_upgrades -d delta -f output.backup
Test and upgrade...Dump...OK
Restore...OK
Upgrade...     Applying delta 0.0.2... OK
OK
Check...OK
columns: []
constraints: []
functions: []
indexes: []
rules: []
sequences: []
tables: []
triggers: []
views: []

Apply deltas to prod? [n]|y: y
Upgrade...     Applying delta 0.0.2... OK
OK
OK

The “prod” database has been migrated, and during the process PUM has checked that the “prod” and “ref” are now in the same state.

That’s all folks! Please contact us at infos@oslandia.com if you have any questions or want to discuss this further.

QGIS 3.0 has been released

0
0

We are very pleased to convey the announcement of the  QGIS 3.0 major release called “Girona”.

The whole QGIS community has been working hard on so many changes for the last two years. This version is a major step in the evolution of QGIS. There are a lot of features, and many changes to the underlying code.

At Oslandia, we pushed some great new features, a lot of bugfixes and made our best to help in synchronizing efforts with the community.

Please note that the installers and binaries are still currently being built for all platforms, Ubuntu and Windows are already there,  and Mac packages are still building.

The ChangeLog and the documentation are still being worked on so please start testing that brand new version and let’s make it stronger and stronger together. The more contributors, the better!

While QGIS 3.0 represent a lot of work, note that this version is not a “Long Term Release” and may not be as stable as required for production work.

We would like to thank all the contributors who helped making QGIS 3 a reality.

Oslandia contributors should acknowledged too : Hugo Mercier, Paul Blottière, Régis Haubourg, Vincent Mora and Loïc Bartoletti.

We also want to thank some those who supported directly important features of QGIS3 :

Orange

The QWAT / QGEP organization

The French Ministry for an Ecological and Inclusive Transition

ESG

and also Grenoble Alpes Métropole

Docker images for QGIS

0
0

With support from Orange we’ve created Docker images for QGIS. All the material for building and running these images is open-source and freely available on GitHub: https://github.com/Oslandia/docker-qgis.

The docker-qgis repository includes two Docker images: qgis-build and qgis-exec.

The qgis-build image includes all the libraries and tools necessary for building QGIS from source. Building QGIS requires installing a lot of dependencies, so it might not be easy depending on the Operating System you use. The qgis-build image makes it easy and repeatable. The result of a QGIS build is Debian Stretch packages that can be used for building the qgis-exec image, as explained below.

The qgis-exec image includes a runtime environment for QGIS Server. The image makes it easy to deploy and run QGIS Server. It is meant to be be generic and usable in various contexts, and in production. There are two ways to build the qgis-exec image. It can be built from the official QGIS 3 Debian packages (http://qgis.org/debian/), or from local QGIS Debian Stretch packages that were built using the qgis-build image.

We encourage you to go test and use our images! We haven’t pushed the images to the Docker Hub yet, but this is in the plans.

Feel free to contact us on GitHub or by email for any question or request!

Deeposlandia 0.4 has been released!

0
0

On a previous article published on this blog, we introduced our work dedicated to convolutional neural network.

We are now happy to announce the 0.4 release of this R&D project!

What’s new?

Until the last released version (0.3.2), neural networks were built with the TensorFlow library. The major modification comes with the transition to the Keras library. Taking advantage of Keras API simplicity, deeposlandia is now easier to use.

It is more reliable too, as it comes now as a standalone package with a set of unit tests.

The API has been simplified; it behaves more intuitively, with a datagen command for generating ready-to-train datasets, a train command to train a model and do predictions after the training process, and a inference command that aims to infer labels from a set of input images.

How to use the code?

The project is easy to clone and install on a system from scratch, by a using virtual environment:

$ git clone https://github.com/Oslandia/deeposlandia
$ cd deeposlandia
$ virtualenv -p /usr/bin/python3 venv
$ source venv/bin/activate
(venv)$ pip install -r requirements-dev.txt

After getting Mapillary vistas dataset from their website and storing it in a data/mapillary/input/ repository, the following command builds preprocessed version of the dataset:

python deeposlandia/datagen.py -D mapillary -s 224

Then the preprocessed dataset may be used in a training effort:

python deeposlandia/train.py -M semantic_segmentation -D mapillary -s 224 -e 5

This produces a trained model saved as a .h5 file on the file system. This backup may be recovered for training the model on more periods and/or for predicting some image labels as follows:

python deeposlandia/inference.py -M semantic_segmentation -D mapillary -i picture.png

These commands are highly configurable, do not hesitate to read the README of the project on Github.

If you have questions when using the code, please contact us by email (infos+data@oslandia.com) or through Github issues. If you want to add some new handsome features to the code, do not hesitate to contribute!

 

Pointcloud talk at the FOSS4G-fr conference

0
0

FOSS4G-fr conference logo

The Oslandia team was massively present at the great FOSS4G-fr conference that was held last week near Paris.

One of the talks we gave was about Pointcloud, the PostgreSQL extension for storing point cloud data (a.k.a. LiDAR data), and LOPoCS, a lightweight server for streaming point clouds from PostgreSQL.

If you want to know more about loading, visualizing and analysing point clouds with PostgreSQL, you can take a look at the annotated slides (in English). And feel free to reach out to us if you want to know more about what we’re doing with point clouds!

Many thanks to the conference organizers and participants, see you there in two years! (And yes, we will be at the main FOSS4G conference in Daar Es Salaam too.)

Viewing all 142 articles
Browse latest View live




Latest Images