Status and prospects of systems biology in grapevine research

The cultivated grapevine, Vitis vinifera L., has gathered a vast amount of omics data throughout the last two decades, driving the imperative use of computational resources for its analysis and integration. Molecular systems biology arises from this need allowing to model and predict the emergence of phenotypes or responses in biological systems. Beyond single omics networks, integrative approaches associate the molecular components of an organism and combine them into higher order networks to model dynamic behaviors. Application of network-based methods in multi-omics data is providing additional resources to address important questions regarding grapevine fruit quality and composition. Here we review the recent history of systems biology in this species. We highlight the most relevant aspects of the discipline and describe important integrative studies that have helped in the global understanding of how this species responds to the environment and how it triggers the fruit ripening developmental program. We also highlight the latest resources that are available for the grapevine community to exploit and take advantage of all the omics data that its being generated.


Introduction
Genes and their products perform complex cellular tasks that are essential for all living organisms. At the molecular level, they are organized as modules forming part of large networks. Within these high-order associations, genes/proteins that are functionally related interact, regulate each other, or form part of a metabolic pathway. The functional characterization of these molecules through forward and reverse genetic analyses has allowed the dissection of their networks and their involvement in diverse cellular processes. In the last decade, however, a massively promoted approach to asset the whole comprehension of a network from a global perspective has been the integration of several types of omics data.
The rise of next generation sequencing (NGS) technologies has led to an expansion in the amount of genomic/transcriptomic data required to be stored and processed. In addition, technologies covering proteomics and other types of omics are rapidly increasing the amount of data being produced. Scientists are now racing to develop efficient data analysis algorithms, user-friendly tools and software applications, and establishing extensive hardware infrastructure for answering different questions of modern life science. It is hypothesized that the larger the amount of omics data being generated for a species the easier for its integration, engendering more robust and reliable analyses.
The grapevine (Vitis vinifera L.) has become an appealing species to define as a 'model' system for studying non-climacteric fleshy fruits. The increasing amount of genomics data being continuously generated within the grapevine community, after the grape genome was sequenced and released in 2007, has certainly helped in this nomination. The grape genome, currently on its second assembly (12X.v2) and its third annotation (VCost.v3) comprises to date 33,568 genes (Canaguier et al. 2017). With the purpose of providing biological meaning to this remarkable amount of data, several initiatives have been introduced for describing genes within their biological context (Grimplet et al. 2009a), including not only in vivo functional characterizations but also in silico analyses such as co-expression networks and other integrative approaches (reviewed by Wong and Matus 2017).
With the commitment of consenting the efficient exploitation of Vitis biological resources and understanding the genetic and molecular basis of all processes in this species, the International Grapevine Genome Program (IGGP; www.vitaceae.org) is currently developing the GrapeIS system. This is an integrated set of interfaces supporting advanced data modeling, rich semantic integration and the next generation of data mining tools linking genotypes to phenotypes (Adam-Blondon et al. 2016). Within the same framework, the recently launched INTEGRAPE consortium (COST Action-mediated) aims to integrate data at different levels to maximize the power of omics and establish a manageable and open data platform. The initiatives mentioned here share the use of FAIR principles that ensure data are Findable, Accessible, Interoperable and Reusable (Wilkinson et al. 2016). The establishment of solid integrative data platforms are compulsory to make available interoperable grapevine datasets and tools. The application of systems biology methods has arisen to fulfil this purpose. Here we provide a brief the structural design principles of living organisms. Distant networks that perform similar tasks (e.g. information processing) all share similar types of recurring patterns of interconnections, thus motifs define universal classes of networks (Milo et al. 2002).
From this and other studies, it was suggested that structures of different networks were governed by the same principles. This new paradigm is embodied within the Oltvai and Barabási life's complexity pyramid, now re-updated and revisited by systems biology advancements (Figure 8.1). Here, cell components arrange themselves in persistent patterns and these in turn form modules with discrete cellular functions. Finally, these modules are hierarchically organized, defining the cell's large-scale functional organization.
Historically, reductionist studies in plants have been aimed for identifying the individual components associated with the occurrence of certain phenotypes. Although this approach has been massively adopted in the last 50 years, successfully producing extensive repertoires of plant molecular components, it begun to lose its effectiveness at the beginning of the current century when it became apparent that majority of phenotypes were produced by complex orchestrations involving myriads of molecular components, many of which were redundant among them. This scenario became more apparent with the development of the so-called omics technologies that provide an accurate molecular snapshot of the biological processes under study by detecting and quantifying the repertoire of molecules that are present (Yuan et al. 2008). Hence, research in molecular biology is gradually shifting towards a holistic perspective, integrating the individual 'omics' datasets, to gain biologically meaningful aspects of plant systems (Sheth and Thaker 2014).
The recent development of high-throughput DNA sequencing, genomics and transcriptomics have pushed these methodologies to become so far, the best-established mature and reliable techniques to characterize molecular systems (Bolger et al. 2018).
Specifically, RNA-seq, the high-throughput sequencing of the cDNA corresponding to the entire set of transcripts in a sample, is applied to identify and estimate transcript abundance including different isoforms produced by alternative splicing as well as to analyze differential gene expression between specific conditions (Martin et al. 2013). The main molecular mechanisms controlling gene expression, namely the interactions between transcription factors and DNA (recently named 'the cistrome'), and the different posttranslational modifications of histones associated with the DNA (epicistrome) are routinely characterized using techniques such as ChIP-seq; the combination of chromatin immunoprecipitation with the high-throughput sequencing of the purified DNA (Chen et al. 2017). DAP-seq is a technique based on high-throughput sequencing that studies the cistrome based on the in vitro expression of affinity-purified transcription factors (Bartlett et al. 2017). Finally, MNase-seq, DNase-seq and ATAC-seq are techniques used to study nucleosome positioning and chromatin accessibility that have been shown to highly influence gene expression (Pajoro et al. 2014;Sullivan et al. 2015;Pass et al. 2017;Bajic et al. 2018).
Despite the clear methodological and analytical advantages of performing genomics studies compared to other omics, it has been demonstrated that the sole use of genomics and transcriptomics is not sufficient to predict phenotypes from the molecular state of biological processes (Papatheodorou et al. 2015). In this respect, proteomics (the analysis of the proteome or the entire set of proteins), and metabolomics (the study of the metabolome or the complete set of metabolites), are currently under development aiming at providing a more exhaustive molecular description of biological systems (Ramalingam et al. 2015).
At this point, the massive amounts of data generated by omics technologies is being stored in public databases considerably exceeding the analytical capacities of humans, making imperative the use of computational resources to extract relevant information. Currently, this scenario is not exclusive to molecular biology as it pervades science in a more general context by inducing the emergence of the so call Big Data or Data Science. This is a discipline that combines high-performance computing, such as the use of computational clusters, with sophisticated statistical methods, in order to answer specific questions of phenomena under analysis (Carmichael et al. 2018). In molecular biology, this has promoted the development of "Molecular Systems Biology". This emerging discipline lays at the intersection between molecular biology, computer science and mathematics/statistics (Figure 8.2). The main methodology in molecular systems biology pertains to the generation of omics data and their integration with already existing data freely available in public databases. This massive amount of data is integrated and analyzed typically using multivariate statistical methods implemented with high-performance computing. Specifically, molecular systems biology pursuits the development of computational/mathematical models of the interactions among the molecular components of the systems responsible for an observed phenotype rather than focusing on the functioning of the isolated individual components. Here, the ultimate goal relates to the generation of tools that allow to model and predict the emergence of specific phenotypes or responses in biological systems (Sheth and Thaker 2014). Commonly, systems of differential equations are used as the modeling structure to achieve this goal.
Nonetheless, network science is emerging as a central paradigm in molecular systems biology as an effective modeling framework (Li et al. 2015).
In the context of network science, a network is a graph whose nodes represent the molecular entities of the system and a directed or undirected edge is drawn between two nodes to specify the interaction between the corresponding molecular components. A numerical value termed weight can be incorporated in the edges to capture the strength of the represented interaction. Topological studies of a network, such as the analysis of free-scale properties, can identify relevant nodes called hubs that are highly connected in the network and play key roles in network robustness and dynamics. Other topological parameters such as 'node transitivity', 'betweenness' and 'eccentricity' are especially suitable to identify relevant molecular components of the biological system under analysis. Clustering techniques and community analysis are used to unravel the underlying structure of networks and are applicable in molecular systems biology to identify molecular modules that function with a certain level of separation from the rest of the system (Aoki et al. 2007). Finally, network motif analysis or the identification of non-random subgraphs can shed light on the building blocks that occur recurrently in biological systems (Defoort et al. 2018).
Two types of gene networks are intensively used in molecular systems biology; gene co-expression networks and transcriptional networks. Gene co-expression networks are normally constructed based on a compendium of microarray and only recently, RNAseq data sets. These are undirected networks where nodes represent genes and undirected edges are drawn between nodes to represent co-expression relationships between the corresponding genes. Transcriptional networks are constructed from ChIP-seq data corresponding to sets of different transcription factors binding to the genome. These are directed networks where nodes represent genes and a directed edge is drawn from gene_i to gene_j, where gene_i codifies for a transcription factor that binds to the promoter of gene_j. Transcriptional networks can be further refined by adding RNA-seq data corresponding to mutants or overexpressors of the transcription factors previously analyzed using ChIP-seq. According to this, weights can be associated with edges to represent an activating, repressing or neutral effect of the binding of the transcription factor to the promoter of the target gene.

8.3
A decade conducting grapevine omics. What´s yet to come Genomics resources for Vitis species have increased promptly within the last fifteen years, beginning with the sequencing of expressed sequence tags (ESTs) (Da Silva et al. 2005;Moser et al. 2005). These resources have permitted to quantitatively assess the grape transcriptome by aiding the development of cDNA and oligonucleotide microarrays (Terrier et al. 2005;Waters et al. 2005). Quantitative data acquisition through microarray analysis permitted large-scale mRNA profiling studies of gene expression to unravel the most important events of berry development and ripening. However, it was not but after the concomitant release of the V. vinifera cv. 'Pinot Noir' genome sequence (Jaillon et al. 2007;Velasco et al. 2007) that a burst of new transcriptomic technologies emerged for this species. In the Affymetrix Grape GeneChip Genome Array, approximately one-third of the expected genes are represented. This platform was largely used for tissue-specific mRNA expression profiling in grape berry tissues Deluc et al. 2007) and responses to abiotic stresses Cramer et al. 2007) and compatible viral diseases (Vega et al. 2011), where all the produced data were collected Although in situ oligonucleotide arrays are still widely used for gene expression profiling in grapevine, a rapid development of new nucleic acid technologies have been largely adopted for genomic, transcriptomic and metagenomic studies in grapevine in the last years (Figure 8.3A). A variety of NGS technologies, including the 454 (Roche) (Margulies et al., 2005), the Genome Analyzer/Hiseq (Illumina Solexa) (Bennett et al. 2005) and the SOLiD (Life Technologies), as well as newer platforms such as Helioscope (Helicos) (Milos 2008), PacBio RS and Sequel (Pacific Bioscience) (Eid et al. 2009), Oxford Nanopore Technologies for single molecular sequencing and Ion Torrent (Life Technologies), based on a semiconductor chip (Rothberg et al. 2011), are available.
Thanks to high-throughput and cost-efficient capabilities of these technologies, an unprecedented amount of data has been generated and a huge amount of genomic and transcriptomic data has accumulated exponentially in Vitis species (Figure 8.3B-3C).
The combination of high throughput sequencing technologies and the grapevine reference genome (Jaillon et al. 2007) has facilitated comprehensive sequence analysis in diverse grapevine germplasms (Table 1). Cultivars with different agronomic and oenological characteristics have been re-sequenced to identify genetic differences underlying the distinct phenotypes (Da Silva et al. 2014;Di Genova et al. 2014;Cardone et al. 2016;Chin et al. 2016, Minio et al. 2017Minio et al. 2019; see Chapter 05) and comprehensive inventories of sequence variations were generated (Mercenaro et al. 2017;Zhou et al. 2017;Liang et al. 2019). On the other hand, transcriptome sequencing using NGS technologies has been widely used to detect gene expression in grapevines (see Chapter 08), including fruit (e.g., Zenoni et al. 2010), leaves (e.g., Liu et al. 2012), flowers (e.g., Domingos et al. 2016), in response to different biotic and abiotic stresses (e.g., Cheng et al. 2015;Blanco et al. 2015;Amrine et al. 2015;Tillett et al. 2011) or to describe the expression of specific transcription factors (e.g., Sweetman et al. 2012). Other grape researchers have used high-throughput expression to examine the phenotypic plasticity of cv. 'Corvina' berries at various developmental stages (Dal Santo.et al. 2013). Despite its primary objective is to characterize expression profile, RNAseq technologies have been also used to identify differential splicing activity and single nucleotide polymorphisms Vitulo et al. 2014) as well as identifying and profiling long non-coding RNAs (Vitulo et al. 2014;Harris et al. 2017).
Since grapevine naturally hosts a reservoir of microorganisms that interact with the plant and affect both the qualitative and quantitative scale of wine production (Martins et al., 2013;Zarraonaindia et al., 2015), grape metagenomics studies also are assuming an increasing resonance in the grape scientific community. Recently, high-throughput Proteomics resources have also arisen in the last decade, despite at a much lower rate. While at the beginning most of these studies used two-dimensional gel analysis and focused on berry metabolism coupled to abiotic stress responses Jellouli et al. 2008;Grimplet et al. 2009b), high-resolution techniques have also been applied to grape such as iTRAQ (Lucker et al. 2009), or much more recently, 2DE gels Within the cell's functions, the transport of essential and beneficial nutrients allows all basic processes to be performed efficiently. In grapevines, ion content profiles can reflect the mineral composition of soils and therefore they can describe certain components of a terroir. Pii et al. (2017) studied the ionomics profile of berries grown in different areas to try to discriminate their geographical origin. By applying multi elemental inductively-coupled plasma-mass spectrometry (ICP-MS), the authors found that rare earth elements were the best chemical descriptors.
Recent attempts for identifying transcription factor binding landscapes have been initiated and deposited in public repositories, despite no publications have yet been produced. Additional efforts are still needed to map protein-DNA and protein-protein interactions at a large scale. Also, DNAse I hypersensitivity mapping could be useful to identify pioneering transcription factors controlling grape and wine quality traits.

From single omics to integrative data analysis
Within single omics studies the interactions between molecules can be represented in networks, where nodes (genes, proteins, metabolites, etc.) are connected by edges that convey any type of association (e.g. relying in abundance or expression levels). In the case of gene co-expression networks (GCNs), edges represent similar gene expression behaviors, while in genome-wide transcription factor binding studies (e.g. ChIP-seq) edges represent direct target-regulator relationships. In protein-protein interaction networks, edges describe physically interacting protein pairs identified from techniques such as high-throughput yeast two-hybrid screens.
Beyond single omics networks, integrative approaches associate the molecular components of an organism and combine them into higher order networks to model dynamic behaviors. The principle is based in the fact that despite individual functions of a single network may be undetermined, its biological role can sometimes be inferred through association with other networks. Integrated/combined networks provide a more complete information of a certain biological processes as they include two or more omics' layers. In the case of combining several networks of the same type into a community network, this can also be beneficial to effectively reveal discrepancies between individual networks while stressing common associations across individual networks (Proost and Mutwil 2016). Networks of experimental evidence can be integrated by superimposing the nodes from individual networks. However, an appropriate integrative method requires biological data to be normalized, standardized, modeled and visualized in order to build an integrated model (Figure 8.4). Data modeling requires special attention as this analysis involves generalization and simplification steps with several assumptions (Yuan et al. 2008).
The first task to perform during the integration of different multi-dimensional omics data consists in matching the features within each omics, as they measure diverse types of molecules and the correspondence between them is not always straight forward. Additional challenges faced during multi-omics data integration are represented by the heterogeneity of the different data sets. Data from each omics is measured using different units whose typical ranges vary in several orders of magnitude. This can potentially affect data analysis and is typically solved using scaling and normalization techniques. Given the wide spectrum of possible normalization techniques it is necessary to apply as many as possible and asses their performance in order to choose the most appropriate technique for the data sets under study. The R package Normalyzer can be applied in this pre-processing of the data (Chawade et al. 2014).
Once data pre-processing is completed and prior to the actual multi-omics integration, some exploratory analyses need to be conducted over the individual data sets.
Due to the high dimensionality of omics data typically these analyses consist in techniques able to reduce complexity in order to extract relevant information. Principal Component Analysis (PCA) constitutes the most widely used projection method in this step. PCA is a multivariate analysis technique whose final goal is to reduce the dimensionality of a large multivariate data set. Here a set of new uncorrelated or orthogonal variables are computed as linear combinations or rotations of the original ones.
These new variables are called principal components and they are defined in such a way that they are sorted according to the percentage of explained variability from the original data under the constrain of being orthogonal or uncorrelated. In this way, typically, the first two or three principal components are sufficient to capture most of the variability of the original data and therefore, a projection comprising only these principal components are further considered in the analysis. Graphical representations of the selected principal components are then used to assess the quality of data replicates, uncover problems raised during sample collection (e.g. batch effects) or to unveil underlying structure in the data by applying clustering techniques. Several R packages are available to perform this step such as factorMineR (Lê et al. 2008) and made4 (Culhane et al. 2005 Finally, multi-omics data integration is carried out. Normally, two different goals exist when integrating different omics. On one hand, researchers may be interested on exploratory analysis to identify the underlying relationship between two omics data sets. On the other hand, researchers may treat one of the omics data set as response variables that need to be predicted from another explanatory omics data set (considered as predictors). Here we discuss two statistical methods that exemplify these two goals. In both cases the input consists of two numerical matrices, Xn×p and Yn×q, that can be generated using two different omics technologies that detect and quantify p and q as different molecules from the same set of n samples.  (Mevik et al. 2007) and mixOmics (Rohart et al. 2017) implement the necessary functions to apply this methodology.

Recent experiences in grapevine systems biology
Throughout the last years several attempts for representing large biological data in networks have been conducted for elucidating the multilayered organization of biological processes in grapevine. In this species, integrated network analyses have been mostly adopted to predict gene functions or to contribute in the study of the regulatory mechanisms that control berry composition and development, trigger defense responses to biotic and abiotic stresses or that are influenced by the terroir (reviewed by Wong and Proteomic/metabolomic composite networks ) and those integrating genome-wide analyses of promoter regulatory elements  have also been generated. The integration of all these data in multilayered networks has allowed building complex maps of molecular regulation and interaction. Some relevant cases will be covered in this section.

Identifying molecular hubs controlling light and cold response pathways
The advent and continued adoption of high-throughput transcriptome profiling platforms in grapevine research has led to the vast expansion of transcriptome datasets representing a wide range of experimental conditions (e.g. specific tissue/organ and its associated developmental series, stress -abiotic and biotic, vineyard management strategies, etc.).
Although each dataset has been generated to address specific goals of its overarching study, together, individual datasets can be compiled into large expression databases to mine for novel biological insights including, but not limited to, comparative transcriptomics between grapevine and other plants, gene co-expression network analysis and functional assignment of genes, and the discovery of condition-specific cis-regulatory and HYH community gene co-expression and cis-regulatory sub-networks in grapevine.
Search of potential gene targets identified a preferential regulation of photosyntheticrelated processes, heat-shock and DNA/protein repair processes, and regulation of the flavonol biosynthetic pathway. This study was crucial for describing the molecular mechanisms explaining the high radiation adaptive mechanisms that grapevines possess (reviewed by Matus, 2016).
Gene co-expression networks have also been integrated with transcription factor binding data to address grape responses to low temperature, in relation to the role of a MYB-like regulator termed AcQUIred tolerance to LOw temperatures (AQUILO; Sun et al. 2018). Here, the authors performed a multispecies GCN, incorporating gene coexpression analysis and in silico TFBS data from grape, with co-expression (associated to the heterologous overexpression of AQUILO) and DAP-seq data in Arabidopsis. The relevance of this study came from the finding that AQUILO was tightly associated with the raffinose family of oligosaccharides (RFOs), a connection that was later validated by quantifying these osmoprotectant molecules in cold-treated grape AQUILOoverexpressing calli.

Regulation of phenylpropanoid metabolism
Presently, the most widely adopted methodology to identify candidate transcriptional The integration of non-coding RNA network analysis to existing conditionspecific GCNs has also been presented to unravel the regulation of phenylpropanoid and flavonoid biosynthesis during berry development and ripening (Wong and Matus 2017).
One of the key findings from this initiative was the discovery of long non-coding RNAs (lncRNAs) that were not only strongly correlated with key structural pathway genes but were also located in close proximity to their co-expressed gene). The lncRNA VIT_210s0042n00100, present in close proximity with all nine VviSTSs of chromosome 10 presented consistent co-expression with all of them. Another case represents one predicted lncRNA (VIT_203s0180n00020) that is linked to VviGT2 through strong coexpression and co-location. This gene encodes an enzyme putatively involved in hydroxycinnamic ester biosynthesis and proanthocyanidin galloylation (Khater et al. 2012).

The fight club goes dry: networks related to grape berry ripening in response to drought
To understand the molecular mechanisms underpinning berry development and ripening at greater detail, recent efforts have focused on understanding the transcriptome dynamics in multiple cultivars across the entire process of berry development and ripening. A study by Massonnet et al. (2018) represented the first monumental study to catalogue the genome-wide transcriptional profile of ten Italian grapevine varieties at four critical stages of berry development, all being cultivated in a single vineyard. In less than a handful of studies, network-based approaches have been applied to identify genes potentially involved in critical developmental stage transitions. Such cases often complement the findings from the widely-adopted differential expression analysis but are also pivotal in revealing novel genes and relationships that were otherwise unattainable from traditional differential expression methods. For example, berry-specific gene coexpression network analysis encompassing immature-to-mature transitions has been particularly insightful in revealing groups of genes with distinct topological properties that can be classified into 'party', 'date' (see Han et al. 2004 for details), or 'fight-club' hubs (Palumbo et al. 2014). Genes that belong to the 'fight-club' hubs in particular were often negatively correlated with their interacting partners in gene co-expression networks, and those who do, were inferred as biologically relevant 'switches' fulfilling negative regulatory roles in the transition of major developmental phases such as ripening.
Although the identity of these major switches was first documented in red grapevine varieties, recent research has now ascertained several common but also reveal variety (red Recent works have provided evidence for the involvement of multiple stress regulons -both ABA-dependent and ABA-independent (reviewed in Nakashima et al. 2014) -in the berry ripening program (Savoi et al. 2017). Certain TF families (e.g NAC, bZIP, AP2/ERF) that share co-expression with downstream water deficit stressresponsive genes may be required to orchestrate the balance between the progression of berry development and stress-associated transcriptional regulation. Further analysis of gene co-expression and gene-metabolite co-response networks of the berry subjected to water deficit stress across critical berry development and ripening phases revealed several distinct modules that were congruently induced by ripening and water deficit stress 2017). Here, metabolome and transcriptome integrated network-based analysis revealed close associations between the expression behaviors of module members (especially the activation of multiple signal transduction pathways) and the dynamics of key central and specialized metabolites involved in the drought response (e.g. proline, branched-chain amino acids, phenylpropanoids, anthocyanins, and free volatile organic compounds). For example, the grapevine homologue of Arabidopsis ERF1, a key regulatory component of the jasmonate and ethylene signaling network (Cheng et al. 2013), whose expression was congruently induced by ripening and water deficit stress, was also identified to be a common berry 'switch' gene. While its precise regulatory role remains to be elucidated, integrated network analysis positioned ERF1 as a putative regulator of proline and anthocyanin accumulation in the berry (Savoi et al. 2017).
VviERF1 was significantly co-expressed with pyrroline-5-carboxylate synthase (P5CS) and VviMYBA2, the key structural gene of proline biosynthesis and a key regulatory gene of anthocyanin biosynthesis in the berry, respectively and shared significant correlation with various anthocyanin compounds. The presence of potential AP2/ERF TFBS (i.e. DRE and GCC-box) situated within the promoter region of P5CS and MYBA2 further reinforce its involvement as a regulator of berry composition during ripening and water deficit stress.

Non-coding RNA networks within grape-fungi pathosystems
Grapevine diseases caused by biotic agents can be devastating for the wine and  GCC-core sub-modules were contained in many genes that were highly induced in berries and leaves infected with fungi such as Botrytis cinerea and Erysiphe necator. Finally, gene co-expression networks of the ATL protein family showed that many of these E3 ubiquitin ligases were induced in grapevine-pathogen interactions including P. viticola and necrotrophic fungi ).

Resources
Next-generation sequencing as well as traditional Sanger sequencing methods are of great significance in unraveling the complexity of plant genomes. These are constantly generating heaps of sequence data to be analyzed, annotated and stored, thus creating a revolutionary demand for resources and tools to manage and handle these necessities (Basantani et al. 2017). Here we present a brief compilation of web resources that are either specific for grape or encompass a variety of species including Vitis sp (Table 2).
At least two grape-specific platforms have been effectively used to study the  Table 2).
Resources such as ATTED-II (http://atted.jp/) are amongst the most popular, providing the opportunity to query microarray and RNA-seq GCNs using the 'guide' gene approach.
ATTED-II also allows assessments of co-expression conservation of co-expressed genes across different plant lineages (Obayashi et al. 2018 (Ohyanagi et al., 2015). Such resources may be used in conjunction with existing grapevine-specific co-expression platforms to build community GCNs or to gain additional insights into the evolutionary context of conserved and/or species-specific coexpressed genes relationship. With the advent of systems biology approaches in grapevine research, data integration arises as a leading aspect to take advantage of such rich sources of information (Gligorijević and Pržulj 2015). Different methods have been proposed to carry out the task of effectively integrating gene expression data and can be usually divided in two categories: i) direct integration and ii) meta-analysis. Direct integration (Rung and  and manage gene expression data from public databases, but it is still mainly a manual effort. The peculiarity and complexity of plant transcriptomes and experimental designs in plant biology require the ability to manage how probes (for microarray) and short read sequences (for RNA-seq) are mapped and thus assigned to genes. The concept of 'measurable transcript' was also used to account for some technical limitations that prevent the possibility to precisely distinguish among genes with high sequence similarity.
In VESPUCCI, data and experiment-related information (meta-data) are collected and curated starting from raw intensities (for microarrays) and raw sequence reads (for RNA-Seq). A robust normalization method and a quality control procedure are performed to allow the direct comparison of gene expression values across different experimental conditions (Engelen et al. 2011). This results in a single coherent gene expression matrix in which each row represents a gene and each column represents a 'sample contrast'.
Sample contrasts measure the difference (in log scale) between a test and a reference condition, both which are designed a priori by curators during the compendium creation process. The expression data itself is a matrix of log-ratios (base 2), so that positive values represent up-regulation, and negative values represent down-regulation of a gene in the test sample compared to the reference sample. VESPUCCI's main goal is to gather together as many expression data as possible to explore patterns of co-expression across several experimental conditions and to provide a high-quality gene expression database to be used for downstream analysis. The creation of a co-expressed genes cluster (known as module) is performed similarly to a BLAST (Camacho et al. 2009) search in which the users can look for expression values for a given set of conditions but using expression correlation instead of sequence similarity to score the best matches. Modules can be modified in several ways in order to highlight the behavior of the genes of interest and to analyze (anti)co-expression patterns.
Considering that gene expressions are represented as relative values, it is fundamental to extensively annotate samples with various sorts of meta-data to ensure that valid biological conclusions can be drawn from the exploration of the compendium.
One of VESPUCCI's biggest effort and most notable feature is the manual curation and quality check of samples. Each sample has been annotated by curators using controlled vocabularies to ensure both human readability and computational tractability. To completely fulfill the properties of the FAIR (Findable Accessible Interoperable Reusable) principles (Wilkinson et al. 2016), VESPUCCI is undergoing a constant renovation to exploit standards and bio-ontologies for data annotation. Finally, the interface is the other pivotal point towards seamless integration with other services and tools and has been designed to adapt to users' needs, as well as to simplify the implementation of other tools on top of it. One example of such means is the NES 2 RA algorithm (Asnicar et al. 2018), a mining tool for transcriptomic data used to expand a known local gene network (LGN) by finding new related genes. This method has been applied to the grapevine transcriptomic dataset using VESPUCCI as data source to expand LGNs related to the secondary metabolic pathways for anthocyanin and stilbenoid synthesis and signaling networks related to the hormones abscisic acid and ethylene (Malacarne et al. 2018).
Compared to Pearson correlation, NES 2 RA LGNs show less edges as it removes less significant interactions, due to noisy or redundant information. This allows to reduce the complexity of the network and focus on the network topology and the most likely gene interactions. NES 2 RA is computationally demanding and relies on the BOINC platform that distributes supercomputation tasks among computers made available by the volunteers participating in the gene@home project. annotation of experiments as soon as (or even before) data are available is also underrated.
It is often considered as an annoying request to fulfill before the publication, while it should be treated as an integral part of the experimental design with the same importance as notes and protocols written in lab notebooks have.

Final remarks
The accuracy of molecular systems biology relies on efficient methods that handle, analyze and visualize large omics data sets. However, it has become evident that the use of a single omics technology is not sufficient to develop predictive models, which in turn is the ultimate goal of this new discipline. Accordingly, the multiple use of technologies such as transcriptomics, cistromics, epicistromics, proteomics and metabolomics, over the same samples or biological conditions has started to be a central methodology in plant molecular systems biology. Multi-omics network modeling has proven to be a successful advance for unraveling the structure of biological processes in plants, as it allows identifying the key components and interactions for system regulation. Conversely, networks frequently require assumptions for data modeling, and since their methods may rely on the existing knowledge regarding the components and interactions of a system, they can evolve to more exactly represent a biological system. Thus, data should be interpreted carefully while these approaches can be complemented by reductionist methods. Notwithstanding these limitations, the use of these methodologies in grapevine research have provided novel perspectives for interpreting omics data and despite its just starting, it is already challenging the analysis of the large amount of data that its being generated for this species.