1 Introduction

A ground motion selection and modification (GMSM) method is presented, suitable for the probabilistic seismic assessment of the damage state and collapse potential of building structures. The objective is to predict the probability distribution of the engineering demand parameters (EDPs) when the structure is subjected to a ground motion having a spectral acceleration \(S_{a} \left( {T_{1} } \right)\) at the fundamental period of the structure, \(T_{1}\). Its advantage over other GMSM methods is that it implicitly matches the multivariate distribution of the response spectrum in the region of the structure elongated period, and it adopts an unbiased estimator of the EDP central tendency.

Ground motion intensity is expressed using the vector-valued intensity measure (IM) \(\left\langle S_{a} \left( {T_{1} } \right), S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right) \right\rangle\) (Theophilou 2014; Theophilou et al. 2017), where \(T_{1}^{{\prime }}\) is an approximation of the elongated period of the structure due to inelasticity effects. Optimized suites of ground motions are obtained from a large ground motion dataset, through stratified sampling on \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\). The optimized suites are used in the dynamic analysis of the structure, resulting in a response prediction that is optimized, compared to using a less efficient IM, or to random sampling. An optimized response prediction implies that a reduced number of ground motions is required to obtain the same level of prediction accuracy, or, conversely, an improved accuracy is achieved when the same number of ground motions is used. The concept is that stratified sampling on IM results in an optimized replication of the central tendency and the dispersion of the IM as well as the associated EDPs, provided that there is sufficiently high correlation between the IM and the EDP.

The proposed GMSM method is applied to the dynamic analysis of a multi-degree-of-freedom (MDOF) structure with high participation of the fundamental mode, which is assumed to represent a real ten-storey building structure designed to Eurocode 2 (2004a) and Eurocode 8 (2004b) for ductility class ‘High’.

2 Motivation and background

Probabilistic seismic response assessment is of interest both in the design of new structures, and in the assessment of existing structures. In the design of new structures the goal is to ensure that the safety level required by the building codes is attained, while in the assessment of existing structures the goal is to quantify the inherent safety level.

Dynamic response-history analysis has various advantages over other methods of evaluating structural response, i.e. it is more accurate, it enables the explicit evaluation of the response at every time step, and it can be used in cases where other methods are unsuitable, such as with structures having complex configuration and complex nonlinear response. However, there are two significant impediments in adopting dynamic response-history analysis for meeting the above safety objectives. First, the dynamic response-history analysis of MDOF structures requires substantial computational work, especially when nonlinear behaviour is considered. Even though computers are becoming continuously more sophisticated in terms of processing power and memory resources, the sophistication and complexity of the mathematical models representing the structural system are also evolving. The specification of appropriately small suites of ground motions, consequently resulting in decreased computational work, will always be a topic of great interest and utility. Second, although the number of recorded ground motions is continuously increasing, there is still relative scarcity of high intensity records, which are of most interest. These two challenges call for the need to develop GMSM methods, to predict the true response with sufficient accuracy and efficiency.

A convenient way of conveying the seismic intensity of the earthquake scenario to the structural engineer is through the spectral ordinate at the fundamental period of the structure, such as \(S_{a} \left( {T_{1} } \right)\). One category of GMSM methods aim at predicting the central tendency of the structural response, given \(S_{a} \left( {T_{1} } \right)\). Such methods are the response spectrum matching (e.g. Eurocode 8 2004b; ICB 2015), which can be facilitated with software tools such as REXEL (Iervolino et al. 2010), the Conditional Mean Spectrum matching proposed by Baker (2011), the genetic algorithm selection and scaling proposed by Naeim et al. (2004), the selection based on a vector of record properties identified by proxy proposed by Watson-Lamprey and Abrahamson (2006), the selection based on a precedence list proposed by Azarbakht and Dolsek (2007), the selection using a harmony search algorithm proposed by Kayhan et al. (2011), the scaling to target scenario with epsilon preservation and selection based on median dispersion minimization proposed by Ay and Akkar (2012), and the integrated software environment coupling ground motion selection with structural analysis proposed by Katsanos and Sextos (2013).

The proposed GMSM method predicts, in addition to the central tendency, the dispersion of the response. A method consistent with this objective is the FEMA P-58 (2012) procedure. A weakness of this method is that by selecting all records to have essentially the same shape, the multivariate nature of the response spectrum distribution (Jayaram and Baker 2008) is suppressed. Jayaram et al. (2011) illustrate through application examples that GMSM methods focusing on matching the response spectrum shape underestimate the response central tendency and dispersion. The semi-automated procedure proposed by Kottke and Rathje (2008), matches the central tendency and the dispersion of the entire suite. Since the procedure matches the median response spectrum to the target response spectrum, the response spectrum shape of the individual records within a suite could exhibit significant variation. However, this procedure does not match the multivariate distribution of the response spectrum.

Some GMSM methods have been proposed that consider, directly or indirectly, the multivariate nature of the response spectrum distribution (Jayaram and Baker 2008). Shome et al. (1998) suggested forming bins of records based on magnitude and distance, and then forming record suites through random sampling and scaling them to \(S_{a} \left( {T_{1} } \right)\); this method is used herein for comparison with the proposed GMSM. Goulet et al. (2007) selected records to match the deaggregation results in terms of magnitude, distance, and epsilon (Baker and Cornell 2005), and subsequently scaled them to the target \(S_{a} \left( {T_{1} } \right)\). Jayaram et al. (2011) proposed an algorithm for selecting suites of records to match simulated response spectra that have a specified mean, variance, and correlation between any two periods. Wang (2011) proposed a similar method, with the difference that the response spectra are conditioned on magnitude and distance, rather than \(S_{a} \left( {T_{1} } \right)\). Bradley (2010) proposed the concept of the Generalized Conditional Intensity Measure, which is a vector-valued IM that contains a multitude of different IMs; records are selected using the algorithm proposed by Bradley (2012) such that the empirical distribution function of the suite matches the target IM distribution.

The proposed GMSM method is a contribution to the methods that match the multivariate nature of the response spectrum distribution.

3 Intensity measure

The proposed GMSM method uses the vector-valued IM shown below

$$\left\langle S_{a} \left( {T_{1} } \right), S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right) \right\rangle$$
(1)

The vector-valued IM is comprised of the spectral acceleration, \(S_{a} \left( {T_{1} } \right)\), which is an absolute measure, and the Normalized Spectral Area parameter, \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) (Theophilou 2014; Theophilou et al. 2017), which is a relative measure, given by

$$S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right) = \frac{1}{{S_{d} \left( {T_{1} } \right)T_{N} }}\mathop \smallint \limits_{{T_{1} }}^{{T_{1}^{{\prime }} }} S_{d} \left( T \right)dT , \quad T_{1} < T_{1}^{{\prime }}$$
(2)

where \(T_{1}\) is the initial fundamental period of the structure, \(T_{1}^{{\prime }}\) is an approximation of the elongated period due to inelasticity effects, \(T_{N} = 1.0\,{\text{s}}\) is a normalizing constant, and \(S_{d} \left( {T_{1} } \right)\) is the response spectrum displacement at period \(T_{1}\).

Due to the normalization to \(S_{d} \left( {T_{1} } \right)\), the \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) value does not change with scaling. In this way, \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) captures the effect of the excitation spectral characteristics (i.e. frequency content) on the response. Thus, it is a measure of the intensity that affects the inelastic response associated with period elongation. In turn, the degree of period elongation depends on the frequency content, which is unique for each ground motion. Hence, the purpose of integrating the response spectrum is to capture the elongated period within appropriately estimated bounds. In this context, \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) can be seen as a descriptor of the local response spectrum shape between periods \(T_{1}\) and \(T_{1}^{'}\).

Theophilou et al. (2017) review a number of similar vector-valued IMs proposed by other researchers.

4 Specification of large-sample datasets

A dataset comprised of a large number, \(N_{gm}\), of ground motions is initially formed, which is sufficiently large, e.g. \(N_{gm} \ge 30\), so that the central tendency and the dispersion of the ground motion parameters can be accurately captured (Walpole et al. 2007). It is therefore termed ‘ground motion large-sample dataset’, denoted as

$$G = \left\{ {GM_{i} |i = 1, 2, 3, \ldots ,N_{gm} } \right\}$$
(3)

where \(GM_{i}\) is ground motion \(i\). The GMSM method performs sampling on dataset \(G\), to form optimized suites of ground motions.

The vectors representing the IM and the EDPs at a particular \(S_{a} \left( {T_{1} } \right)\) are denoted as \(\varvec{IM}\) and \(\varvec{EDP}\), respectively. More specifically, \(\varvec{IM}_{\varvec{i}}\) and \(\varvec{EDP}_{\varvec{i}}\) represent the \(\varvec{IM}\) and \(\varvec{EDP}\), respectively, that correspond to ground motion \(i\). The dataset of the \(\varvec{IM}_{\varvec{i}}\) and \(\varvec{EDP}_{\varvec{i}}\) pairs is therefore termed ‘dynamic analysis large-sample dataset’, denoted as

$$S = \left\{ {\left( {\varvec{IM}_{\varvec{i}} ,\varvec{EDP}_{\varvec{i}} } \right)|i = 1, 2, 3, \ldots ,N_{gm} } \right\}$$
(4)

\(\varvec{IM}\) is the vector-valued IM presented in (1). The \(\varvec{EDP}\) vector is comprised of scalar EDPs that are considered as relevant for the particular problem studied. The EDPs considered should be appropriately selected so as to collectively provide a representative description of the structural damage state and collapse potential, consistent with the objective of structural assessment. It is, thus, desirable that EDPs representative of different damage/failure mechanisms are included.

5 Methodology

The GMSM method aims to replicate the true inelastic response distribution of the structure, for a given \(S_{a} \left( {T_{1} } \right)\). In the present section the GMSM method is presented, and in the following section the sample statistics are described. Use is made of the IM and EDP distributions, appropriately transformed so that they are normally distributed, and denoted as follows: \(IMT\) is defined as the \(S_{dN} \left( {T_{1} ,T_{1}^{'} } \right)\) element of \(\varvec{IM}\) transformed so that its distribution is normal; \(EDPT\) is defined as a scalar element of \(\varvec{EDP}\) transformed likewise.

The proposed GMSM method consists of the following steps:

  • Step 1: Formation of ground motion large-sample dataset

  • Initially, dataset \(G\) is formed, having a sufficiently large size (e.g. \(N_{gm} \ge 30\)). Ground motions selected have seismological parameters (such as magnitude, and source-to-site distance) consistent with the earthquake scenario. It is strongly recommended to check the distribution of the dataset with respect to a ground motion prediction model representative of the expected earthquake.

  • Step 2: Ground motion normalization to \(S_{a} \left( {T_{1} } \right)\)

  • All ground motions in dataset \(G\) are normalized to \(S_{a} \left( {T_{1} } \right)\). An upper limit can be imposed to the scale factor, e.g. 3 as proposed by Shome et al. (1998) or 4 by Iervolino and Cornell (2005), beyond which scaling is considered inappropriate (i.e. the frequency characteristics of the scaled motion are not consistent with its amplitude); ground motions exceeding this limit should be discarded.

  • Step 3: Determination of \(T_{1}\) and \(T_{1}^{{\prime }}\)

  • The integration interval periods \(T_{1}\) and \(T_{1}^{{\prime }}\) are determined, which are then used in the evaluation of \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\). \(T_{1}^{{\prime }}\) represents the ‘ultimate’ elongated period and can be calculated using the simplified procedure proposed by Theophilou et al. (2017), or the procedure proposed in FEMA 440 (2005). Alternatively, the optimum \(T_{1}^{{\prime }}\) can be evaluated by first carrying out dynamic analyses on a single-degree-of-freedom system using all ground motions in dataset \(G\). Then, through regression analysis for a range of candidate \(T_{1}^{{\prime }}\) values, the final \(T_{1}^{{\prime }}\) value is selected as the one with the highest correlation between \(IMT\) and \(EDPT\).

  • Step 4: Calculation of \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\)

  • \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) is calculated for each ground motion using Eq. (2).

  • Step 5: Evaluation of \(IMT\) mean and variance

  • The \(IMT\) mean and variance of dataset \(G\) are evaluated.

  • Step 6: Partition of \(IMT\) distribution into \(N_{s}\) strata

  • The \(IMT\) domain is partitioned into \(N_{s}\) strata, such that all strata have equal probability of occurrence. It is strongly recommended that at least \(N_{s} = 5\) strata are used.

  • Step 7: Formation of optimized suites

  • In the last step, optimized suites are formed through ‘stratified sampling’, by selecting \(N_{i}\) ground motions from each stratum. The total number of selected ground motions is \(N_{g,s} = N_{s} \cdot N_{i}\).

6 Estimation of response distribution through stratified sampling

6.1 Statistical dependence between intensity and response

Simple linear regression analysis is performed between the independent variable \(IMT\), and the dependent variable \(EDPT\), using the data in dataset \(S\),

$$EDPT = \alpha + \beta \cdot imt + \varepsilon$$
(5)

where \(\alpha\) is the axis intercept, \(\beta\) is the slope, \(\varepsilon\) is the random error that is assumed to be normally distributed with \(E\left( \varepsilon \right) = 0\) and \(Var\left( \varepsilon \right) = \sigma^{2}\), and \(imt\) is the value of \(IMT\). The regression equation parameters \(\alpha\) and \(\beta\) are determined, together with the variance, \(\sigma^{2}\), and the correlation, \(\rho\). By visual observation of the regression analyses in Theophilou (2014), \(\sigma^{2}\) can be inferred to be constant with respect to \(IMT\). These are used next, in the evaluation of the stratified sampling statistics.

6.2 Stratified sampling on \(IMT\)

The theoretical distribution of \(IMT\) is normal with mean \(\mu_{IMT}\) and standard deviation \(\sigma_{IMT}\). The area under the curve is divided into \(N_{s}\) strata, which have equal probability of occurrence. Random variable \(IMT_{i}\) represents variable \(IMT\) within stratum \(i\). Within any stratum \(i\) the theoretical mean is denoted as \(\mu_{IMTi}\), and the theoretical standard deviation as \(\sigma_{IMTi}\). The sampling fraction of stratum \(i\), \(w_{IMTi}\), is equal to \(1/N_{S}\) for all strata.

The estimator of the sample mean \(IMT_{i}\), \(\overline{IMT}_{i}\), is an unbiased estimator of \(\mu_{{IMT_{i} }}\), on the assumption of simple random sampling within stratum \(i\) (e.g. Ang and Tang 1975). Consequently, the estimator of the sample mean \(IMT\), \(\overline{IMT}\), is an unbiased estimator of \(\mu_{IMT}\) (e.g. Cochran 1977).

6.3 EDPT statistics

The method proposed, based on regression analysis principles, finds the relationship that best fits to the data in dataset \(S\). The simple linear regression model is adopted to describe the relationship between the sample mean \(EDPT\) of stratum i, \(\overline{EDPT}_{i}\), and the value of \(\overline{IMT}_{i}\), \(\overline{imt}_{i}\), as shown below

$$\overline{EDPT}_{i} = \alpha + \beta \cdot \overline{imt}_{i} + \varepsilon$$
(6)

The sample mean \(EDPT\), \(\overline{EDPT}\), can be calculated from \(\overline{EDPT}_{i}\), as shown below

$$\overline{EDPT} = \mathop \sum \limits_{i = 1}^{{N_{s} }} w_{EDPTi} \cdot \overline{EDPT}_{i}$$
(7)

where \(w_{EDPTi}\) is equal to \(w_{IMTi}\), because of the linear relationship assumed.

It is thus possible to calculate \(\overline{EDPT}\) from \(\overline{imt}_{i}\) and from \(\overline{imt}\) as

$$\overline{EDPT} = \frac{{\mathop \sum \nolimits_{i = 1}^{{N_{s} }} \left( {a + b \cdot \overline{imt}_{i} } \right)}}{{N_{s} }} = a + b\frac{{\mathop \sum \nolimits_{i = 1}^{{N_{s} }} \overline{imt}_{i} }}{{N_{s} }} = a + b \cdot \overline{imt}$$
(8)

where \(a\), and \(b\) are the estimates of \(\alpha\) and \(\beta\). Since \(\overline{IMT}\) is an unbiased estimator of the mean \(IMT\), and \(\overline{EDPT}\) is a linear function of \(\overline{IMT}\), it is proved that \(\overline{EDPT}\) is an unbiased estimator of the theoretical mean \(EDPT\).

The standard error of the mean \(EDPT\), \(\sigma_{{\overline{EDPT,s} }}\), is (Cochran 1977)

$$\sigma_{{\overline{EDPT,s} }} = \sqrt {\mathop \sum \limits_{i = 1}^{{N_{s} }} w_{{EDPT_{i} }}^{2} \cdot \sigma_{{EDPT_{i} }}^{2} } = \sqrt {\mathop \sum \limits_{i = 1}^{{N_{s} }} \frac{{\sigma^{2} }}{{N_{s}^{2} }}} = \frac{\sigma }{{\sqrt {N_{s} } }}$$
(9)

where \(\sigma_{{EDPT_{i} }}^{2}\) is the variance of \(EDPT\) in stratum \(i\), assumed to be equal to \(\sigma^{2}\).

6.4 Comparison to random sampling

Stratified sampling on \(IMT\) nearly always results in a standard error of the mean that is lower than that obtained through random sampling (Cochran 1977). In this section it is shown that, as a consequence, the standard error of the mean \(EDPT\) is also lower, by a degree that depends on \(\rho\).

The ratio \(\sigma_{{\overline{EDPT,s} }} /\sigma_{{\overline{EDPT,r} }}\) can be evaluated assuming an equal number of ground motions between stratified and random sampling, i.e. \(N_{g,s} = N_{g,r}\), where \(N_{g,r}\) is the number of randomly sampled ground motions, and \(\sigma_{{\overline{EDPT,r} }}\) is the standard error of the mean \(EDPT\) through random sampling. The following equation gives the relationship of \(\sigma_{{\overline{EDPT,s} }} /\sigma_{{\overline{EDPT,r} }}\) to \(\rho\) for the case of \(N_{i} = 1\)

$$\frac{{\sigma_{{\overline{EDPT,s} }} }}{{\sigma_{{\overline{EDPT,r} }} }} = \frac{{\sigma /\sqrt {N_{s} } }}{{\sigma /\left( {\sqrt {N_{g,r} } \sqrt {1 - \rho^{2} } } \right)}} = \sqrt {1 - \rho^{2} }$$
(10)

Plotted in Fig. 1 is \(\sigma_{{\overline{EDPT,s} }} /\sigma_{{\overline{EDPT,r} }}\), for \(\left| \rho \right|\) values from 0 to 1.0.

It can be inferred from Fig. 1 that at one extreme, at \(\rho = 0\), \(\sigma_{{\overline{EDPT,s} }}\) is equal to \(\sigma_{{\overline{EDPT,r} }}\), which means that stratified sampling is not more accurate than random sampling. At the other extreme, at \(\left| \rho \right| = 1.0\), \(\sigma_{{\overline{EDPT,s} }}\) is equal to zero, which means that the \(\overline{IMT}\) obtained through stratified sampling on \(IMT\) can be used to evaluate the exact \(\overline{EDPT}\). In the intermediate region, \(\sigma_{{\overline{EDPT,s} }} /\sigma_{{\overline{EDPT,r} }}\) is decreasing as \(\left| \rho \right|\) is increasing, which means that, stratified sampling on \(IMT\) is potentially more efficient than random sampling, because \(\sigma_{{\overline{EDPT,s} }}\) is lower than \(\sigma_{{\overline{EDPT,r} }}\). The decrease becomes more pronounced at high \(\left| \rho \right|\) values, because \(\sigma_{{\overline{EDPT,s} }}\) is decreasing at a higher rate than \(\sigma_{{\overline{EDPT,r} }}\).

Fig. 1
figure 1

Relationship between \(\sigma_{{\overline{EDPT,s} }} /\sigma_{{\overline{EDPT,r} }}\) and \(\left| \rho \right|\)

6.5 Size of suites

The size of the suites plays an important role in the accuracy of \(IMT\) and \(EDPT\) estimation. Theophilou (2014) applied the method to a single-degree-of-freedom system assumed to represent an idealized structure. Using suites of \(N_{g,s} = 8\) ground motions (\(N_{s} = 8\), \(N_{i} = 1\)) resulted in a standard error in the mean \(IMT\) of < 2%, and in a standard error in the mean \(EDPT\) of approximately 10%. The variance of \(IMT\) was overestimated by about 10%, and the variance of \(EDPT\) by about 10%. This estimation accuracy is deemed acceptable. Using suites of \(N_{g,s} = 5\) ground motions (\(N_{s} = 5\), \(N_{i} = 1\)) resulted in less accurate, but still acceptable estimation.

Further research could lead to further optimization of the method with respect to the \(N_{s}\) and \(N_{i}\) combinations used, e.g. choosing between (\(N_{s} = 4\), \(N_{i} = 2\)) and (\(N_{s} = 8\), \(N_{i} = 1\)).

6.6 Multivariate distribution of response spectrum

The objective of the proposed GMSM method is to match the multivariate distribution of the response spectra between \(T_{1}\) and \(T_{1}^{{\prime }}\). This is achieved in an implicit manner, by matching the dispersion of \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\), given that the response spectra are conditioned on \(S_{a} \left( {T_{1} } \right)\). The match is implicit, in the sense that instead of matching the multivariate distribution of the response spectra over a range of discrete periods, the dispersion of the scalar \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) is matched.

7 Application to a building structure

The GMSM method was applied in the probabilistic response assessment of a first-mode dominated MDOF system, representative of a real building structure.

7.1 Structure description

The structure examined is a ten-storey reinforced concrete building, used for residential occupancy. Each storey has a rectangular floor plan of dimensions 24 m × 16 m, and a height of 3.2 m. The columns are located in a 5 × 5 array, spaced 6 m apart in the longitudinal direction, and 4 m apart in the transverse direction. The structure was designed to Eurocode 2 (2004a) and Eurocode 8 (2004b) for ductility class ‘High’. Further details of the structure are provided in Theophilou (2014).

7.2 Finite element model

A two-dimensional finite element model was developed, representing one interior frame in the longitudinal direction of the structure. The model was analysed using the computer program OpenSees version 2.4.0 (PEER 2013). Beams and columns were represented as elastic beam-column elements. At the locations of the beam and column joints, and at the location of the column and foundation joints, the beam and column frame elements are connected to the respective nodes by zero-length ‘nonlinear spring’ elements that allow rotation only. The moment-rotation relationship assigned to the nonlinear springs is obtained using the modified Ibarra-Krawinkler hysteresis model (Lignos and Krawinkler 2012a). The nonlinear spring parameters were determined using the empirical relationships and values given by Haselton and Deierlein (2007), and by Lignos and Krawinkler (2012b). The hysteresis rules are consistent with the modified Clough-Johnston model (Clough and Johnston 1966; Mahin and Lin 1983), with the exception that they are modified to conform with the multilinear backbone curve. The finite element model considered second-order (P-Delta) effects.

7.3 Natural modes

The first three natural periods of the structure and the corresponding mass participation ratios are summarized in Table 1. As anticipated, the structure is first-mode dominated, which is consistent with the concept of selecting ground motions based on the inelastic response of the first mode, adopted in the proposed GMSM method.

Table 1 Natural modes of structure

7.4 Ground motion large-sample dataset

The ground motion large-sample dataset (i.e. dataset \(G\)) was formed by selecting 40 ground motions, listed in Table 2, with seismological characteristics consistent with the earthquake scenario. The earthquake scenario can be described as a strong earthquake, at a close distance to the fault, but sufficiently long to avoid near-fault effects. The main criteria used in the selection of the records were a moment magnitude higher than 6, and closest distance to fault less than about 30 km. The source of the records was the NGA Database 2005 (PEER 2005) and the European Strong-Motion Database (Ambraseys et al. 2002). The graph in Fig. 2 presents the distribution of the moment magnitude and closest distance of the records. It can be observed that these two variables are widely distributed throughout.

Table 2 Ground motion records
Fig. 2
figure 2

Magnitude-distance distribution of records (each point corresponds to one station and hence two horizontal records)

To ensure that the dataset of ground motions is a representative sample of the population, its statistics were compared to the Boore and Atkinson (2008) ground motion prediction model. Epsilon was evaluated in the period range 0.5–2.0 s and was found to match very well the standard normal distribution, which is the theoretical distribution of epsilon.

7.5 Ground motion suites

One dataset of 2000 optimized suites was formed through stratified sampling, using \(N_{s} = 8\) strata, sampling \(N_{i} = 1\) ground motion from each stratum, resulting in \(N_{g,s} = 8\) ground motions per suite. For comparison, another dataset of 2,000 suites was formed through random sampling, using \(N_{g,r} = 8\) ground motions per suite.

Figure 3 shows the displacement response spectra of the ground motions, normalized to \(S_{d} \left( {1.45\,{\text{s}}} \right)\) so as to facilitate the observation of the spectral shape in the region \(T_{1} = 1.45\,{\text{s}}\) to \(T_{1}^{{\prime }} = 2.90\,{\text{s}}\). Figure 3a shows the spectral distribution of dataset \(G\), and Fig. 3b shows suite No. 1 obtained through stratified sampling. It can be observed that the distribution of the suite approximates well the distribution of dataset \(G\).

Fig. 3
figure 3

Ground motion displacement response spectra normalized to \(S_{d} \left( {1.45\,\text{s} } \right)\)a dataset \(G\), b Suite No. 1 through stratified sampling

The “Random” sampling method referred to herein, corresponds to the method described in PEER Report 2009/01 (Haselton et al. 2009) as “\(S_{a} \left( {T_{1} } \right)\) Scaling with Bin Selection” proposed by Shome et al. (1998). With this GMSM method a bin of ground motions is first selected with moment magnitude and distance criteria. Then a suite of ground motions is formed through random sampling without replacement, and scaling to \(S_{a} \left( {T_{1} } \right)\).

7.6 Incremental dynamic analysis

Incremental dynamic analysis (Vamvatsikos and Cornell 2002) was carried out by incrementing \(S_{a} \left( {T_{1} } \right)\) up to 1.2 g. \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) is evaluated by integration of the elastic displacement spectrum from \(T_{1} = 1.45\,{\text{s}}\) to \(T_{1}^{{\prime }} = 2.90\,{\text{s}}\), i.e. \(T_{1}^{{\prime }} = 2.0T_{1}\). The upper intensity limit corresponds to the highest \(S_{a} \left( {T_{1} = 1.0\,{\text{s}}} \right)\) found on seismic hazard maps (e.g. Southern California) with probability of exceedance of 2% in 50 years. At this probability of exceedance, safety against collapse is evaluated. The ground motions used at each intensity increment were scaled to the target \(S_{a} \left( {T_{1} } \right)\). The response parameters investigated were the Park-Ang overall structural damage index (Park and Ang 1985), \(OSDI\), and the maximum interstorey drift ratio, \(MIDR\).

7.7 Regression analysis

The probability distribution of \(S_{dN} \left( {1.45,2.90} \right)\) was found to conform well to the normal distribution, and to the lognormal distribution, using Lilliefors (1967) test at a significance level of 5%. The normal distribution was assumed in the regression analysis, as it resulted in the highest correlation with the response parameters. Similarly, by Lilliefors (1967) test it was found that the probability distributions of the response parameters \(OSDI\), and \(MIDR\), conform well to the lognormal distribution, at a significance level of 5%.

Thus, regression analyses were carried out between \(S_{dN} \left( {1.45,2.90} \right)\) and the natural logarithms of the response parameters, at various \(S_{a} \left( {T_{1} } \right)\) intensities, using the simple linear model. The correlation coefficient, \(\rho\), between \(S_{dN} \left( {1.45,2.90} \right)\) and (a) \({ \ln }\left( {OSDI} \right)\) is shown in Fig. 4a, and (b) \({ \ln }\left( {MIDR} \right)\) in Fig. 4b.

Fig. 4
figure 4

Correlation coefficient between \(S_{dN} \left( {1.45,2.90} \right)\) and a\(\ln \left( {OSDI} \right)\), b\(\ln \left( {MIDR} \right)\)

It can be observed that, in general, \(\rho\) is low (in the range of 0.2–0.3) at the low nonlinearity levels (at intensity \(S_{a} \left( {T_{1} } \right)\) between 0.2 and 0.6 g), and moderate (in the range of 0.5–0.6) at the high nonlinearity levels (at intensity \(S_{a} \left( {T_{1} } \right)\) between 1.0 and 1.2 g).

7.8 Response sample statistics

Figure 5 shows the mean \({ \ln }\left( {OSDI} \right)\), and \({ \ln }\left( {MIDR} \right)\), plotted against \(S_{a} \left( {T_{1} } \right)\). It appears that, in general there is excellent conformity between the random sampling, the stratified sampling, and the large-sample dataset results. This is an empirical proof that estimator \(\overline{EDPT}\) through stratified sampling is unbiased.

Fig. 5
figure 5

Mean a\(\ln \left( {OSDI} \right)\), b\(\ln \left( {MIDR} \right)\)

Figure 6 shows the standard error of the mean \({ \ln }\left( {OSDI} \right)\), and \({ \ln }\left( {MIDR} \right)\), plotted against \(S_{a} \left( {T_{1} } \right)\). It appears that, in general, stratified sampling results in a lower standard error than random sampling, and that there is a marked reduction at \(S_{a} \left( {T_{1} } \right) \ge 0.8\,{\text{g}}\). In particular, the maximum reduction in the standard error is 22% for \({ \ln }\left( {OSDI} \right)\), and 27% for \({ \ln }\left( {MIDR} \right)\). The standard error obtained using both methods, between 7 and 13%, is deemed acceptable.

Fig. 6
figure 6

Standard error (SE) of the mean a\(\ln \left( {OSDI} \right)\), b\(\ln \left( {MIDR} \right)\)

Figure 7 shows the variance of \({ \ln }\left( {OSDI} \right)\), and \({ \ln }\left( {MIDR} \right)\), plotted against \(S_{a} \left( {T_{1} } \right)\). It appears that, there is excellent conformity between the random sampling and the large-sample dataset results. The stratified sampling results exhibit some discrepancy from the large-sample dataset results, reaching a maximum of 15% overestimation in the case of \({ \ln }\left( {OSDI} \right)\) (corresponding to 7% overestimation in the standard deviation).

Fig. 7
figure 7

Variance of a\(\ln \left( {OSDI} \right)\), b\(\ln \left( {MIDR} \right)\)

7.9 Response prediction

In Fig. 8 response parameters \(OSDI\), and \(MIDR\) are plotted at 95% probability of occurrence as 95% prediction intervals. It can be observed that the stratified sampling prediction interval is generally narrower than the random sampling prediction interval. It is also observed that the stratified sampling prediction interval converges towards the large-sample prediction interval. It appears that despite the small overestimation in the variance, the proposed GMSM method is still more efficient than random sampling.

Fig. 8
figure 8

Prediction intervals of a\(OSDI\), b\(MIDR\)

The efficiency of the GMSM method in reducing the prediction interval can be quantified through the ‘Efficiency Index’, \(EI\), defined as

$$EI = \frac{{\left( {R_{U} - R_{L} } \right) - \left( {S_{U} - S_{L} } \right)}}{{\left( {R_{U} - R_{L} } \right) - \left( {L_{U} - L_{L} } \right)}}$$
(11)

where \(S\), \(R\), and \(L\) denote the interval bounds of stratified sampling, random sampling, and the large-sample, respectively, and subscripts \(U\) and \(L\) denote the upper and lower interval bounds, respectively. The higher the \(EI\), the more efficient the GMSM method; at zero \(EI\) the GMSM method is not more efficient than random sampling, whereas at unity \(EI\) the GMSM method achieves the same accuracy as the large-sample dataset. \(EI\) is plotted in Fig. 9 against \(S_{a} \left( {T_{1} } \right)\). It appears that \(EI\) for \(OSDI\) is in the range of 0.07–0.43, and for \(MIDR\) in the range of 0.20–0.52. The highest \(EI\) was observed at \(S_{a} \left( {T_{1} } \right)\) equal to 1.0 g and 1.2 g; this is attributed to the higher correlations obtained at the high nonlinearity levels.

Fig. 9
figure 9

Efficiency index for a\(OSDI\), b\(MIDR\)

The computational work needed using the proposed GMSM method, expressed as number of ground motions required to obtain the same standard error as through random sampling, was reduced to about 72% at the moderate nonlinearity levels, and to about 50% at the high nonlinearity levels.

7.10 Elongated period estimation

After the application of the GMSM method in the response prediction of the MDOF system, in which the elongated period was assumed to be equal to \(T_{1}^{{\prime }} = 2.0T_{1}\) for practical purposes (Theophilou et al. 2017), the elongated period observed was investigated. The elongated period is the predominant period of oscillation when the structure enters the inelastic range and was estimated from the response power spectrum.

Initially, the power spectrum of the top node response displacement was calculated. Figure 10 shows the response spectrum for the Loma Prieta 090 ground motion at intensities \(S_{a} \left( {T_{1} } \right) = 0.2\, {\text{g}}\) and \(S_{a} \left( {T_{1} } \right) = 1.0\, {\text{g}}\). In the \(S_{a} \left( {T_{1} } \right) = 0.2\, {\text{g}}\) spectrum it is obvious that the dominant frequency is close to 0.69 Hz, which corresponds to the natural period of the fundamental mode \(T_{1} = 1.45\, {\text{s}}\). The spectral power at frequencies lower than 0.69 Hz is close to zero, inferring a predominantly elastic response. In the \(S_{a} \left( {T_{1} } \right) = 1.0\, {\text{g}}\) spectrum it is observed that the spectral power in the frequencies lower than 0.69 Hz is substantial, which is attributed to period elongation due to inelasticity effects. The abrupt increase in the spectral power at frequencies close to zero observed in both spectra has various possible explanations, however this range is not of practical interest. This range is excluded by seeking the elongated period at frequencies above 0.2 Hz.

Fig. 10
figure 10

Power spectrum of Loma Prieta 090 record

Within the frequency range of interest, i.e. 0.2 Hz and 0.69 Hz, the elongated period was calculated. The dominant frequencies were first identified as the localised peaks on the power spectrum. The frequency corresponding to the elongated period was calculated as the average of the frequencies, weighted with respect to the spectral power. The curve resulting from this approach is presented in Fig. 11. The mean estimated elongated period, \(T_{1}^{{\prime }}\), at \(S_{a} \left( {T_{1} } \right)\) equal to 1.0 g, and 1.2 g, was found to be 2.31 s, and 2.41 s, respectively, using the power spectrum method. This is a reasonable agreement with the assumed elongated period of 2.90 s.

Fig. 11
figure 11

Mean estimated elongated period

8 Differentiation to other methods

Goulet et al. (2007) suggested selecting records to match the deaggregation results in terms of magnitude, distance, and epsilon (Baker and Cornell 2005), and subsequently scaling them to the target \(S_{a} \left( {T_{1} } \right)\). In their method the spectral shape (i.e. epsilon) is assumed to be dependent on the magnitude and distance, thus a sufficiently large number of records (e.g. in their example 34) is required to cover all magnitude-distance combinations. The difference of the present GMSM method is that the selection criteria of magnitude and distance are relaxed. Thus, the method is applicable when the normalized spectral shape between \(T_{1}\) and \(T_{1}^{{\prime }}\) (i.e. Normalized Spectral Area) does not change significantly with magnitude and distance, as supported by the findings of Shome et al. (1998). In this way the number of required records for the same magnitude and distance ranges is reduced.

Jayaram et al. (2011) proposed a method that initially generates simulated acceleration response spectra, which collectively match the central tendency, dispersion, and multivariate distribution of the target response spectrum. Subsequently, ground motions are selected that match the simulated response spectra. As Jayaram et al. (2011) states, the suite of selected ground motions resulting from the main procedure may deviate slightly from the target central tendency and dispersion. For this reason, a supplementary “greedy” procedure is specified to replace the ground motions one-by-one and thus minimize the residuals.

Bradley (2012) proposed an algorithm that generates random simulations of response spectra from the conditional multivariate distribution of intensity measures, obtained from the Generalized Conditional Intensity Measure (Bradley 2010). The method considers a multitude of intensity measures (including spectral acceleration), in contrast to the Jayaram et al. (2011) method that considers only spectral acceleration. It can also utilize the full seismic hazard disaggregation probability. Bradley (2012) also discusses the issue of variability between the matching ground motion suites, particularly for small suites, and proposed selecting the suite with the lowest residual between the empirical and target conditional distribution.

Both the Jayaram et al. (2011) method and the Bradley (2012) method place stringent requirements over the matching spectral shape. As Bradley (2012) states, finding a ground motion with identical spectral shape to the simulation is generally not possible. For this reason, ground motion selection is based on minimizing the spectral shape deviation from the simulations. The proposed GMSM method adopts a different approach, in which selection is based on \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\), which is a scalar parameter descriptive of the spectral shape, rather than the actual spectral shape. By limiting the matching period range between \(T_{1}\) and \(T_{1}^{{\prime }}\), the number of matching ground motions is higher. In addition, it adopts an unbiased estimator of the EDP central tendency, and it has been found to match the EDP dispersion well, thus avoiding the problem of residuals associated with the Jayaram et al. (2011) and Bradley (2012) methods.

9 Concluding remarks

A GMSM method has been proposed with which optimized suites of ground motions can be sampled, suitable for a probabilistic seismic response assessment of building structures. The key aim of the method is to estimate with reasonable accuracy the central tendency and the dispersion of the EDPs. It can be used in cases wherein seismic intensity can be defined in terms of \(S_{a} \left( {T_{1} } \right)\). Its advantage over other GMSM methods is that it implicitly matches the multivariate distribution of the response spectrum in the region of the structure elongated period. It also adopts an unbiased estimator of the EDP central tendency, and has been found to match the EDP dispersion well.

Ground motion intensity is expressed using the vector-valued IM \(\left\langle S_{a} \left( {T_{1} } \right), S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right) \right\rangle\). Optimized suites of ground motions are formed by partitioning the distribution of \(S_{dN} \left( {T_{1} ,T_{1}^{{\prime }} } \right)\) into \(N_{s}\) strata, and then selecting \(N_{i}\) ground motions from each stratum. The proposed method replicates the mean and variance of \(IMT\), whereas the standard error is reduced, compared to random sampling. At the same time the mean and the variance of \(EDPT\) are also replicated. The advantage over random sampling is that when there is high enough correlation between \(IMT\) and \(EDPT\), stratified sampling results in a reduced standard error in the mean \(EDPT\). Consequently, an optimized response prediction is achieved.

The method was applied in the analysis of a first-mode dominated MDOF system, representing a ten-storey building frame designed to Eurocode 2 (2004a) and Eurocode 8 (2004b) for ductility class High. The central tendency of the response was found to have excellent conformity to the large-sample dataset, while some discrepancy in the dispersion estimation was observed. Expressing the 95% percentile response as 95% prediction interval it was found that the highest ‘Efficiency Index’ achieved was 0.43 for \(OSDI\), and 0.52 for \(MIDR\). The computational work needed to obtain the same accuracy as through random sampling was reduced to about 72% at the moderate nonlinearity levels, and to about 50% at the high nonlinearity levels. The elongated period, estimated using the power spectrum method, was found to be in reasonable agreement with the assumed elongated period at the higher nonlinearity levels.

In conclusion, the proposed GMSM method is most efficient in predicting response at moderate to high nonlinearity levels, due to the significant reduction in the standard error of the \(EDPT\), which is a result of the sufficiently high correlations observed between the proposed \(IMT\) and the \(EDPT\). It is therefore suitable for first-mode dominated structures when the probabilistic assessment focuses on limit states associated with moderate to severe damage and collapse.