Document Type : Research Paper

Authors

Iran University of Science and Technology

10.22054/ilt.2026.87711.931

Abstract

Employing reliable evidence-based academic word lists has been a noticeable concern for numerous educators, students, and researchers in English for Academic/Specific Purposes (EAP/ESP) courses. Currently, many of these courses still lack research-oriented materials and rely heavily on traditional ways of teaching field-specific terminologies. This study aimed to create a specialized corpus to identify the most frequent academic words in Welding Metallurgy (WM) and to analyze the most prevalent three- and four-word lexical bundles (N-grams). We employed a corpus-based approach, and identified top-tier journals of WM; then, we analyzed the articles from 2017 to 2023 that followed the Introduction, Methodology, Results, and Discussion (IMRD) format. As such, 875 empirical research articles were compiled and analyzed to establish a specialized corpus with four million words. After applying word selection criteria, 608 lemmas were identified. Furthermore, we recognized 68 technical acronyms in the field and grouped them into an independent list. We also highlighted the most prevalent N-grams to explore the field's formulaic language. Consequently, 61 prevalent technical N-grams were recognized. As part of pedagogical implications, this study would deepen ESP course instructors’ knowledge and urges them to be more mindful of evidence-based material. It also encourages students to give more weight to their fundamental discipline-specific needs by incorporating authentic word lists in practice.

Keywords

Main Subjects

INTRODUCTION

Globalization represents the integration of diverse cultures, languages, organizations, and countries from across the globe. This phenomenon has facilitated worldwide connectivity for both personal and business purposes and profoundly influenced the English language. Furthermore, the incremental rate of English language-speaking countries’ cooperation with other nations in terms of education, economy, culture and some other variables has led to the constant growth of interest in English (Akhatovna Fakhrutdinova et al., 2023). Also, English serves a global function by providing access to a wide range of academic resources, including scientific research and educational materials, tailored to meet the linguistic needs of students who are enrolled in English for Specific Purposes (ESP) or English for Academic Purposes (EAP) courses. As a result, this linguistic paradigm allows students, researchers, and instructors to engage with cutting-edge research, evidence-based findings, research-informed materials, and textbooks beyond their native language. Within EAP and ESP domains of applied linguistics (AL), English in the new era functions not merely as an educational tool but as a prerequisite to enter the academic community.

Additionally, most engineering students at the tertiary level must complete required ESP courses (Rashidi & Mazdayasna, 2016). These courses are taught at universities where subjective textbooks serve as the primary instructional material for students. Also, it is widely acknowledged that academic vocabulary plays a vital role in the literacy skills (i.e., reading and writing) of native and non-native language learners (Saeedi et al., 2022). A frequently mentioned prerequisite for entering academia is the development of academic word lists that represent the language and terminology of both general and specific disciplines. This requirement has been empirically demonstrated in studies aimed at creating general word lists, such as General Service List (GSL; West, 1953), which encompass vocabulary items frequently used in daily conversations, reading materials, and writing assignments.

Aside from that, Coxhead (2000) developed an Academic Word List (AWL) derived from research articles (RAs) across four scientific areas: commerce, law, arts, and science. The objective of these academic word lists was to assist users in meeting their linguistic needs in both daily life and academic contexts. In this regard, Coxhead (2000) recognized that AWL does not adequately meet the needs of students majoring in various scientific disciplines, as its coverage is not evenly distributed across these fields. For instance, Hyland and Tse (2007) stated that various lexical items exhibited distinct lexical behaviors across multiple scientific domains with respect to meaning, range, and collocation. This emphasizes the importance of creating field-specific word lists that are customized for scientific disciplines.

Academic word lists, which cater to the linguistic requirements of non-native English speakers, offer evidence-based academic terms across a diverse array of disciplines. Some of the studies surrounding this concept include investigations in Accounting (Khany & Kalantari, 2021), Chemistry (Valipouri & Nassaji, 2013), Physics (Vukovic-Stamatovic, 2024), and Veterinary Science (Özer & Akbas, 2024). Although several studies have been conducted to investigate the linguistic characteristics across various scientific domains, none have specifically addressed welding metallurgy (WM) and its linguistic features within RAs. Upon examining the calculated mean (x̄ = 2,098,650) and median (x̃ = 1,015,000) of the corpora size in previous studies of engineering, it becomes evident that several of the studies did not adequately represent the field due to the small corpus size (Durovic et al., 2021; Korzen et al., 2023; Thiruchelvam et al., 2018). That is, according to Brysbaert and New (2009), a corpus of one million running words is considered a reliable list of highly frequent words. Additionally, certain studies have gathered and incorporated localized materials to establish specialized corpora aimed at addressing the EAP/ESP language needs of specific communities (Mudraya, 2006).

Nearly all university engineering curricula have incorporated materials science as a critical component in recent years. Although it is true that physicists and chemists study materials from a scientific perspective, the rise of materials science is noteworthy because it can bring together many concepts from physics and chemistry into a single, comprehensive field (Anderson et al., 2004). Moreover, materials are important to human beings due to the advantages gained from manipulating their properties (Hutagalung, 2012). The importance of (welding) metallurgy both as a science and an engineering discipline (Pineau & Quere, 2011), its common characteristics with other engineering fields such as chemistry (Lippold, 2014), and the absence of evidence-based ESP/EAP textbooks in engineering disciplines has compelled us to consider developing authentic word lists for this field.

 

LITERATURE REVIEW

Nation (2001) distinguished four categories of vocabulary that are common in English academic writing: Academic, low-frequency, technical, and high-frequency words. High-frequency words are frequently used in reading materials, writing assignments, and casual conversations. GSL (West, 1953) is the most widely known list of high-frequency words. In addition, it is worth noting that many word lists have been created to help students learn general vocabulary for a broad range of their interests. However, low-frequency words, which are distinguished by their limited distribution and infrequent usage, create a significant portion of the vocabulary of any given field. Some of these words might be used rarely—possibly just once or twice. They are, however, the most numerous groups of words in the field. Relatively, about 5% of the vocabulary in academic texts is composed of low-frequency words, which include proper names, words that are rarely used in everyday speech, non-high-frequency words, and technical terms from other areas of science (Nation, 2001). As Nation (2001) pointed out, one person's technical vocabulary is another person's low-frequency words, showing how different low-frequency words can be.

Academic vocabulary is generally absent in basic general English texts but comprises a significant portion of the lexical units that are used in academic discourse. Students frequently encounter challenges in mastering these terms due to their relative unfamiliarity compared to the specialized vocabulary pertinent to their disciplines. In order to address this issue, AWL  plays a vital role in helping students understand specialized vocabulary in different fields (Coxhead, 2000). This list consists of 570-word families that are not included in the 2,000 most commonly used English words of GSL. Thus, it functions as an essential educational resource for students with academic goals (Coxhead & Nation, 2001).

To gain a more nuanced understanding of these word lists, Liu and Han (2015) identified two categories: (1) general and (2) field-specific. The former represents vocabulary items that are associated with diverse fields, accessible and applicable by most ESP students as foundational knowledge for their academic pursuits, exemplified by Coxhead’s AWL (2000). Conversely, the latter subsumes terminologies that frequently occur across various subject domains within a specific discipline (Khani & Tazik, 2013; Martinez et al., 2009). Although they may be prevalent in a specific subject area, they are encountered less often in other contexts. More specifically, technical terminologies include a variety of categories, some of which are exclusive to particular fields of study (Nation, 2001).

Given the importance of discipline-specific and technical word lists and the inadequacy of AWL to address the linguistic requirements of EAP/ESP students across various disciplines, numerous studies have been conducted to identify the academic vocabulary items associated with different hard and soft sciences. After critically reviewing the literature, we found 28 studies that had investigated the lexical characteristics of fields within the hard sciences. In particular, nine studies have analyzed vocabulary patterns within engineering disciplines.

Aiming to construct a discipline-specific word list for chemistry, Valipouri and Nassaji (2013) developed a field-specific word list specifically designed for English as a Foreign Language (EFL) students in chemistry. The researchers randomly selected ten journals from each of the four areas of chemistry, and a total of 1,185 RAs, published between 2003 and 2009 and conforming to the IMRD format (Swales, 1990), were included. Doing so, they established a corpus of four million words in Chemistry. Following Coxhead’s (2000) criteria, this study identified 1,577-word families that met the established criteria for frequency, range, and specialized occurrence. Furthermore, the researchers also included GSL items. They then eliminated function words, technical terms, and abbreviations from the original compilation in order to improve the word list. The result was a list of 1,400-word families of technical chemistry vocabulary, which included 327-AWL, 390-non-GSL/AWL, and 683-GSL-word families. In short, the constructed AWL attained a thorough coverage of 81.18% within the 4-million-word corpus of chemistry RAs.

As two of the most exemplary corpus studies, Mudraya (2006) and Ward (2009) rendered specialized word lists for EAP/ESP engineering students. In specific, Mudraya identified thirteen engineering textbooks that encompassed essential subjects for all engineering students, such as Technical Mechanics, Engineering Materials, and Mechanics of Materials, among others. Hence, the specialized corpus of engineering was established with approximately two million running words, representing 1,200-word families and 9,000-word types commonly utilized throughout the corpus. To construct the word list, the word families had to occur at least 100 times across the whole corpus. After applying the word selection criteria, the wordlist was constructed with 1,260-word families. The established corpus was compared with COBUILD, the Bank of English Corpus, and the BNC, and the relationship among the 50 most prevalent closed-class word forms was found to be statistically significant.

As a follow-up study, Ward analyzed linguistic features in engineering and constructed a specialized word list to address the linguistic needs of EAP students. To achieve this, Ward consulted with instructors from five engineering disciplines: chemical, civil, electrical, industrial, and mechanical. A total of twenty-five textbooks were assembled, and random pages were selected until the word count reached 10,000. Accordingly, he established a corpus of 271,000 words, identified 10,290 distinct word types to develop a foundational word list comprising 229 words, and designed to support English language acquisition for beginners in various engineering fields. Ward created a corpus, called Basic Engineering List (BEL), covering 17.2%, 15.6%, and 21% of the sub-corpora. In comparison to a text centered on mass transfer in some of the engineering sub-fields (e.g., mechanical), BEL demonstrated a coverage of 17.7%. This observation indicates a reliable, significant level of coverage across BEL's various technical materials. Moreover, although AWL contained a considerably broader vocabulary than BEL, it provided only 11.3% coverage of engineering content. This comparison underscores the effectiveness of BEL in delivering comprehensive coverage of engineering terminology, even though it has a more limited scope than AWL. Table 1 presents a comprehensive list of studies that have developed distinctive wordlists in the hard sciences.

 

Table 1. An Overview of the Developed Wordlists across Hard Sciences

Disciplines and Fields of study

Author(s)

Medical Science

Wang et al. (2008)

Engineering

Mudraya (2006)

Science, Social Sciences and Engineering

Hyland & Tse (2007)

Medical Science

Chen & Ge (2007)

Pharmacology

Fraser (2007)

Engineering

Ward (2009)

Medical Science

Hsu (2013)

Chemistry

Valipouri & Nassaji (2013)

Earth Remote Sensing (Aerospace)

Korzen et al. (2023)

Nursing

Yang (2015)

Oil marketing and Oil industry

Ebtisan Saleh Aluthman (2017)

Medical Science

Lei & Liu (2016)

Engineering

Todd (2017)

Plumbing

Coxhead & Demecheleer (2018)

Science & Engineering

Veenstra & Sato (2018)

Civil Engineering

Gilmore & Millar (2018)

Physiotherapy

Jamalzadeh & Chalak (2019)

Science

It-ngam & Phoocharoensil (2019)

Computer Science

Chen & Lei (2019)

 Veterans Equine Therapy

Safari (2019)

Zoology

Kruawong & Phoocharoensil (2020)

Pharmacology

Heidari et al. (2020)

Medical Science

Ashrafzadeh (2021)

Computer Science

Roesler (2021)

Marine Engineering

Durovic et al. (2021)

Physics

Vukovic-Stamatovic (2024)

Veterinary Medicine

Özer & Akbas (2024)

Chemistry

Xodabandeh et al. (2023)

Urban Planning

Amini Farsani et al. (2025)

The previously exemplified studies created comprehensively custom-tailored corpora in engineering sub-fields, where they prove to be more beneficial for EAP/ESP students when referring to specialized word lists in comparison to AWL. However, in association with an academic and technical word list, no study has been conducted to establish a corpus depicting academic words and technical acronyms of WM, along with their most frequent multiword constructions. In brief, our academic word list will not only encompass the vocabulary of WM but can also be extrapolated to other engineering and related fields. Conversely, our technical word list would be field-specific and, in a stricter sense, more representative of WM.

 

PURPOSE OF THE STUDY

One aim of this study is to focus on WM inclusively, resulting in more robust and reliable outcomes for the discipline. Hence, we incorporated empirical research articles from leading journals in the field to ensure the generalizability and representativeness of the findings. At first sight, we intended to create a field-specific and an academic word list for WM and identified its most common lexical bundles. In addition, Biber et al. (2004) defined lexical bundles as the most common lexical sequences in a given register, also known as fixed expressions or N-grams. These constructions are frequently used in scientific writing and are essential for producing texts that adhere to the rhetorical conventions of any given fields of study (Salazar, 2014). Thus, we also aimed at generating a concrete list of multiword terms (N-grams) for WM. This study was guided by the following research questions:

  • What are the most frequent academic words of Welding Metallurgy?
  • To what extent do GSL and AWL cover the entire corpus of Welding Metallurgy Research Articles?
  • Which of the General Word Lists and General Academic Word Lists (GSL, AWL, NGSL, and NAWL) are more beneficial for metallurgy students?
  • What are the most frequent lexical bundles (N-grams of three and four) of Welding Metallurgy across the whole corpus?

 

METHODS

To establish and identify the domain of corpus, this investigation adhered to the benchmarks of Plonsky (2013), which include content (i.e., WM excerpts), location (i.e., WM journals and RAs), and time (i.e., publication date). To do so, we initially consulted five experts in metallurgy to recommend the top ten journals in the discipline. Subsequently, seven of the most recommended journals were selected and investigated through Scientific Journal Rankings (SJR) to verify the eminence of these journals (see Table 2). The rationale for the inclusion of top-tier journals was to represent authentic language, as published papers in these journals undergo rigorous proofreading, peer review, and editorial processes. In order to establish a representative corpus across various years, we incorporated empirical RAs that adhered to the Introduction, Methodology, Results, and Discussion (IMRD) format (Swales, 1990) and those that were published between 2017 and 2023.

Additionally, we attempted to ensure that the journals of WM were representative and balanced in the corpus. The corpus was established by including the same number of research articles per year (n = 25) from each journal. It is evident that the inclusion of articles is not consistent across time. This is because certain journals, such as the Welding Journal and the Journal of Advanced Joining Processes only publish a limited number of research articles each year for each volume. Additionally, the journals' primary focus is on WM, as evidenced in journals such as Journal of Materials Science and Engineering: A, Science and Technology of Welding and Joining, and Journal of Materials Research and Technology. For this reason, a specialized corpus of WM was developed, comprising nearly 4 million running words.

 

 

Table 2: Journals' Information

Time Span

Journal

2018-2022

Acta Materialia

2017-2021

Journal of Advanced Joining Processes

2019-2023

Journal of Materials Research and Technology (JMRT)

2017-2021

Materials Science and Engineering: A

2017-2021

Science and Technology of Welding and Joining

2018-2022

Welding in the World

2018-2021, 2023

Welding Journal

 

Corpus Establishment

To create the corpus, all RAs were standardized, meaning all references, footnotes, tables, figures, and appendices were omitted. Subsequently, they were converted to TXT files, each representing a separate journal, and inserted into AntWordProfiler 1.5.1 (Anthony, 2021). As the first stage of analyzing the data using this concordancer, we applied two criteria: (1) specialized occurrence and (2) range. We also prioritized range over frequency for its representativeness. By exporting the results to Excel files, we applied the word selection criteria and removed proper nouns and terms irrelevant to WM.

 

Word Selection Criteria

We adhered to the criteria outlined by Coxhead (2000), including specialized occurrence, range, and frequency. To ascertain the soundness of technical vocabulary, we needed to extend our analysis beyond the GSL and AWL. In this way, we attentively obtained these items in relation to the resemblance of their lexical characteristics with New General Service List (NGSL) and New Academic Word List (NAWL) as well. As mentioned earlier, the range was also prioritized over the frequency to prevent bias arising from journal word length and topic-related words. Vocabulary items needed to appear 28.57 times per million words and in half or more of the journals to fulfill the frequency and range requirements, respectively. Therefore, to create the field-specific word list, we incorporated technical terms that occurred at least 114 times across the entire corpus (i.e., frequency) and in four journals (i.e., range). Besides, we chose lemma—a headword along with its inflected forms— as the count unit since it is known to be highly efficient for students' literacy skills (e.g., reading and writing) within academic discourse (Dang, 2019; Durrant, 2014) and for its pedagogical advantages (Brown et al., 2020; Gardner & Davies, 2014; Lei & Liu, 2016).

 

N-Gram Identification

The N-gram tool in AntConc 4.2.4 (Anthony, 2024) was employed to identify the most frequently occurring N-grams (trigrams and four-grams in this study) and compile a list that served as a representative of the field. The employed criteria for recognizing tri- and four-grams throughout the entire corpus were range and frequency, both utilizing a cut-off point for identifying discipline-specific word lists— a range of four and a frequency of 114. According to Lei and Liu (2016), determining the cut-off point for including N-grams and the extent of N-grams' significance is a subjective decision that strongly depends on the researcher's judgement. Accordingly, first, a preliminary list of the most frequent N-grams was generated by applying the N-gram selection criteria; then, the third author, whose area of interest is mainly WM, explored the data and singled out the most frequent and important N-grams in the field.  During this process, N-grams that either began or ended with a function word were excluded from the dataset (Sun & Lan, 2023).

 

Validity and Reliability

Our first step was to ensure that the journals are as representative as possible. Therefore, not only did we seek other experts’ help for the selection of journals, but we also referred to SJR. As another step, we cleaned the texts and lemmatized the words to have a valid analysis of frequency, range, and specialized occurrence. Furthermore, we compared our word list with those developed by others for various engineering fields. We also evaluated the coverage of our list against (N)GSL, (N)AWL, and Non-(N)GSL-(N)AWL (coverage test) as fundamental criteria to validate this work (see results and discussion).

We achieved reliability through a comprehensive procedure of peer-checking as well as consulting two experts in metallurgical science (the third author and a full professor). After compiling the texts, the first and second authors meticulously checked the excerpts for any errors or mismatches. Subsequently, after the general corpus was established, these two authors randomly surveyed the words for relevance. After creating the specialized word list containing academic, technical, and n-grams, all of the authors held several meetings to verify that they were concretely representative. This resulted in achieving a near-perfect inter-rater reliability agreement of 93% (κ  = 0.90). Other than that, although the third author is a WM expert, to enhance trustworthiness, we consulted two professors specializing in metallurgy from different universities and discussed the accuracy and representativeness of the wording.

 

RESULTS

Academic Word List

To identify the most frequently occurring unique words of the field (Research Question 1), we adhered to the word selection criteria of Coxhead (2000), which encompassed range, frequency, and specialized occurrence. As Table 3 (see Appendix for the full list) shows, we found 608 lemmas that were prevalent in WM RAs. As the words indicate, a notable number of them are characteristic of the features of welding and clearly related to metallurgy and materials science, in general.

 

 

 

 

 

 

Table 3. The Most Prevalent Academic Words for Welding Metallurgy

Lemma

Range

Freq

Welding

7

27541

Zone

7

7635

Tensile

7

7143

Interface

7

7010

Alloy

7

6701

Arc

7

6265

Laser

7

5899

Strain

7

5892

Microstructure

7

5736

FSW

7

5572

Friction

7

4813

Fracture

7

4736

Deformation

7

4414

Shear

7

4203

Thermal

7

3985

HAZ

7

3962

Residual

7

3857

Fatigue

7

3434

Plastic

7

3331

Specimen

7

3315

 

Coverage of Academic Word List

In disclosing the coverage rate of GSL and AWL with respect to our developed word list, we used AntWordProfiler 1.5.1 (Anthony, 2021) and imported the output in an Excel file. As illustrated in Table 4, GSL (West, 1953) comprises 2,682,109 tokens and 4,494-word types, accounting for 68.29% of the entire corpus. The results suggest that GSL plays a critical role in enhancing the reading and comprehension of WM RAs, as it encompasses a significant portion of the corpus. The initial list of GSL (West, 1953) covers 61.48% of the corpus, whereas the second list covers only 6.81%. Furthermore, AWL (Coxhead, 2000) constitutes 9.16% of the entire corpus, surpassing the second list of GSL (West, 1953). This supports the idea that adhering to a rigid sequence in learning word lists—such as mastering GSL before AWL and subsequently other specialized word lists—is not essential for achieving a solid comprehension of academic texts (Valipouri & Nassaji, 2013).

 

Table 4. GSL and AWL Coverage across the Corpus

TYPE%

TYPE

TOKEN%

TOKEN

Word Lists

7.46

2787

61.48

2414793

1st level of GSL

4.57

1707

6.81

267316

2nd level of GSL

5.92

2213

9.16

359947

AWL

82.05

30663

22.55

885739

Non-GSL-AWL

100

37370

100

3927795

Total

 

Coxhead’s AWL (2000) encompasses 2,213-word types, which accounts for 359,947 tokens and cover 9.16% of the entire corpus. Conversely, there are 30,663-word types that are not included in either of these lists— including 885,739 tokens and representing 22.55% of the entire corpus. Following a thorough refinement of the above-mentioned list, the WM AWL was constructed with 608 lemmas. Given that the Non-GSL-AWL encompasses 22.55% of the WM RAs and the AWL's 9.16% coverage, it can be concluded that the Non-GSL-AWL, which developed into the WM AWL, tends to offer great efficiency regarding the comprehension of academic information if it is co-used with GSL and AWL.

Furthermore, the Non-GSL-AWL in our corpus offered greater coverage than Coxhead's AWL. This phenomenon underscores the critical importance of creating field-specific word lists, supporting the theory that suggests distinct lexical items exhibit distinct behaviors in terms of frequency, collocation, and meaning across a broad range of domains (Hyland & Tse, 2007). Laufer and Nation (1999) contend that in order to effectively understand academic RAs, second language (L2) readers should strive to be acquainted with approximately 95% (approximately 3,000 words) of the vocabulary contained within these texts. GSL and AWL accounted for nearly 77.45% of the corpus, suggesting that the academic community in WM may have difficulty understanding and interacting with these texts. Hence, WM AWL plays a pivotal role in achieving fluency and comprehension of technical information.

However, one should not neglect the shortcomings of GSL and AWL noted in previous studies (Brezina & Gablasova, 2015; Gardner & Davies, 2014). In this sense, we ran our specialized corpus against two newer general and generic academic word lists. As shown in Table 5, NGSL (Browne, 2014; Browne et al., 2013)— a more recent developed general word list than GSL, includes 2,270-word types, represents 2,241,385 tokens, and accounts for 57.06% of the entire corpus. Furthermore, NAWL (Browne, 2014; Browne et al., 2013), which was developed as a newer version of AWL, accounts for 3.97% of the entire corpus, comprising 155,961 tokens and 102-word types. Concomitantly, 38.96% of vocabularies appeared in neither of NGSL nor NAWL. Consequently, our results suggest that while GSL and AWL are older than NGSL and NAWL, they provide more coverage across our specialized corpus, thereby serving better to tailor the linguistic needs of ESP students of WM.

 

Table 5. NGSL and NAWL Coverage across the Corpus

TYPE%

TYPE

TOKEN%

TOKEN

Word Lists

6.07

2270

57.07

2241385

NGSL

1.88

702

3.97

155961

NAWL

92.05

34398

38.96

1530449

Non-NGSL-NAWL

100

37370

100

3927795

Total

 

Technical Acronyms and N-grams

After developing the WM AWL, the acronyms that met the technical nomenclature of the field were included and grouped into an independent list; thus, 68 technical acronyms were identified. To provide a clearer view of the technical acronyms, a selection of the technical acronyms that have been identified in the field are illustrated in Table 6 (For the full list, see Appendix).

 

 

Table 6. Technical Acronyms of the Field

Technical Acronym

Full Form of the Acronym

AF

Acicular Ferrite

FEM

Finite Element Method

EDX

Energy Dispersive X-Ray Spectroscopy

SIG

Stud Inert Gas

EBW

Electron Beam Welding

DT

Digital Twin

MAG

Metal Active Gas

IPF

Interstitial Pulmonary Fibrosis

PC

Horizontal Welding Position

CT

Computed Tomography

RT

Radiographic Testing

HF

High Frequency

CF

Corrosion Fatigue

HCP

Hexagonal Close Packed

FSP

Friction Stir Processing

FSSW

Friction Stir Spot Welding

LBW

Laser Beam Welding

OB

Oxygen Blowing

CMT

Cold Metal Transfer

 

In addition, to examine the formulaic language of WM, we analyzed the most frequent N-grams that met the word selection criteria and the technical nature of this field. This process led to the identification of 61 N-grams (see Table 7).

 

Table 7. Technical N-grams of the Field

Type (Trigrams)

Freq

Range

Type (Four-grams)

Freq

Range

across the weld

241

7

during the welding process

288

7

Heat affected zone

300

7

friction stir welding (FSW)

293

7

along the weld

197

7

gas metal arc welding

140

7

angle grain boundaries

144

7

gas tungsten arc welding

158

7

austenitic stainless steel

158

7

heat affected zone (HAZ)

280

7

average grain size

309

7

microstructure and mechanical properties

249

7

bead on plate

205

7

post weld heat treatment

146

7

during the process

206

7

properties of the joint

118

7

during the welding

475

7

properties of the weld

115

7

electron beam welding

201

7

strength of the joint

178

7

Scanning electron microscopy

114

7

friction stir spot welding

116

6

friction stir welding

751

7

during the welding process

288

7

gas metal arc

201

7

friction stir welding (FSW)

293

7

gas tungsten arc

236

7

gas metal arc welding

140

7

heating and cooling

114

7

gas tungsten arc welding

158

7

high heat input

142

7

heat affected zone (HAZ)

280

7

high strength steel

165

7

microstructure and mechanical properties

249

7

lap shear strength

141

7

post weld heat treatment

146

7

laser beam welding

154

7

properties of the joint

118

7

low carbon steel

143

7

properties of the weld

115

7

low heat input

175

7

strength of the joint

178

7

metal arc welding

202

7

friction stir spot welding

116

6

microstructure and mechanical

304

7

during the welding process

288

7

near the interface

117

7

 

 

 

near the weld

160

7

 

 

 

post weld heat

194

7

 

 

 

resistance spot welding

288

7

 

 

 

scanning electron microscope

185

7

 

 

 

scanning electron microscopy

188

7

 

 

 

severe plastic deformation

222

7

 

 

 

solid state joining

138

7

 

 

 

solid state welding

126

7

 

 

 

Friction stir welding

305

7

 

 

 

stir zone (SZ)

126

7

 

 

 

strength and ductility

118

7

 

 

 

stress strain curves

147

7

 

 

 

tool rotation speed

129

7

 

 

 

top and bottom

143

7

 

 

 

tungsten arc welding

170

7

 

 

 

ultimate tensile strength

252

7

 

 

 

upper and lower

220

7

 

 

 

weld cross section

124

7

 

 

 

weld heat treatment

155

7

 

 

 

x-ray diffraction

146

7

 

 

 

affected zone (az)

131

6

 

 

 

friction stir processing

144

6

 

 

 

friction stir spot

147

6

 

 

 

friction stir welded

291

6

 

 

 

friction stir welds

147

6

 

 

 

high speed camera

129

6

 

 

 

 

From a broader view, both technical and n-gram word lists move beyond the single words in the academic word list, which would make the results more context- and field-specific. They will also operate as dynamic supplements for experts and students to widen their scope and move past the rigidity of single word units.

 

DISCUSSION

RQ1: What are the Most Frequent Academic Words of Welding Metallurgy?

This study offers an academic word list of WM terms drawn from prestigious journals in the field, yielding results that are more reliable and representative. Our findings corroborate the work of various scholars who have created field-specific word lists for engineering (Coxhead & Demecheleer, 2018; Durovic et al., 2021; Gilmore & Millar, 2018; Korzen et al., 2023; Thiruchelvam et al., 2018), emphasizing the development of discipline-specific word lists for engineering rather than generic academic word lists (Hsu, 2014; Mudraya, 2006; Todd, 2017; Ward, 2009). Our study closely resembles that conducted by Gilmore and Millar (2018) in terms of the included genre since both studies focus on RAs. That is, Gilmore and Millar (2018) conducted a study to analyze the lexical profile of the most frequently used academic terms in Civil Engineering. More specifically, they created a specialized corpus for this discipline encompassing 11 sub-disciplines and comprising 8 million running words. Subsequently, they compared the corpus with external corpora, including COCA, NGSL (Browne, 2014; Browne et al., 2013), and NAWL (Browne, 2014; Browne et al., 2013), to ensure the distinctiveness of the terminology related to civil engineering. They applied two criteria: dispersion and keyness. As a result, they found 2,967 keywords in the field.

 

RQ2: To What Extent Do GSL and AWL Cover the Entire Corpus of Welding Metallurgy RAs?

Compared with other established specialized corpora in engineering, it is evident that GSL, AWL, and Non-GSL-AWL exhibit distinct characteristics. Therefore, researchers are encouraged to create word lists specific to their respective disciplines and genres, as evidenced by the coverage percentages across various corpora of the same genre. In the current study, across the entire corpus, the proportions of GSL, AWL, and Non-GSL-AWL were 68.29%, 9.16%, and 22.55%, respectively. This distribution underscores the noticeable reliance on general vocabulary within the field, which is indicative of the interdisciplinary nature of metallurgy, where fundamental concepts and terminology are frequently employed.

Moreover, in WM, the proportion of Non-GSL-AWL words indicates the distinct academic and field-specific jargon necessary for accurate communication, while the significant presence of AWL terms highlights the field's academic rigor and specialized knowledge. Welding metallurgy, in particular, requires an in-depth knowledge of thermal processes, phase transformations, and material properties; hence, a specialized vocabulary is needed to accurately convey these intricate concepts. As our findings highlight, in order to effectively assist students and practitioners in mastering the complex aspects of WM, it is crucial to strike a balance between general academic language and field-specific terms. A cross-comparison of word list coverage, as documented in various studies of hard sciences, primarily general and specific fields of engineering, is presented in Table 9. This comparison illustrates the distinct lexical behavior of the fields. Our findings corroborate the literature, indicating that an academic wordlist is not equally effective across various fields, including distinct sub-disciplines within a specific field, such as engineering.

 

Table 9. Word Lists Coverage across Different Corpora

Word list

Marine Engineering

(Durovic et al.)

Chemistry

(Valipouri & Nassaji)

Science and Engineering

(Veenstra et al.)

Basic Engineering

(Ward)

Current Study

GSL

71.39%

65.46%

72.3%

72.3%

68.29%

AWL

8.07%

9.96%

14.3%

11.3%

9.16%

Non-GSL-AWL

20.54%

24.57%

13.4%

16.4%

22.55%

 

RQ3: Which of the General Word Lists and General Academic Word Lists (GSL, AWL, NGSL, and NAWL) are More Beneficial for Welding Metallurgy Students?

Generally, the purpose of mastering general and academic word lists is to enhance one's proficiency in reading and understanding of field-specific texts. The General Word Lists (i.e., GSL and NGSL) aim to provide students with the most frequently used general vocabulary from various corpora, whereas the General AWLs are specifically designed to highlight the most common academic terminology found in scientific literature (i.e., AWL and NAWL).  It is noteworthy that the GSL is frequently criticized as an obsolete word list among general and AWLs. Our findings indicate that although GSL was established in 1953, it continues to surpass recently developed wordlists in corpus coverage (see Figure 1). Thus, it is highly beneficial and efficient for assisting scientific communities in reading and comprehending academic and technical texts. The same holds for the AWL of Coxhead (2000), which provided more coverage across the corpus than the recently developed general AWL (Browne, 2014; Browne et al., 2013).

 

 

Figure 1. Extent of Coverage of Word Lists across the Corpus

 

RQ4: What are the Top Trigrams and Four-grams of Welding Metallurgy?

In addition to reporting the most prevalent academic and technical words of WM, we also identified the most prevalent lexical bundles of three and four across the entire corpus. The significance of identifying trigrams and four-grams is that, as common features of scientific writing, they play a pivotal role in creating texts that adhere to the rhetorical conventions of a particular research field (Salazar, 2014). Hence, the top trigrams and four-grams of the field across the entire corpus were identified. Parallel to that, Khamphairoh and Tangpijaikul (2012) conducted a corpus-based study to identify technical vocabulary within the insurance sector and identified the top twenty N-grams in the field. Furthermore, Gilmore and Millar (2018) identified the most frequently occurring word sequences, ranging from three to six words. Moreover, they generated a list of prevalent lexical bundles, encompassing 366 to 1,138 phrases. Our study is primarily comparable to that of Gilmore and Millar (2018) because we incorporated all of the frequent word sequences of the specialized corpus. However, it differs from that of Khamphairoh and Tangpijaikul (2012) in that they only represented the top 20 N-grams of the study.

 

CONCLUSION AND IMPLICATIONS

We conducted this study to identify key academic terms in the field and assess the level of support that GSL and AWL offer to the WM community, with the goals of filling the gap in a specialized corpus of this field and developing a trustworthy word list. With that in mind, an academic word list for WM with 608 lemmas, a technical word list with 68 terms, and a multiword expression list with 61 N-grams were created by applying word selection criteria (Coxhead, 2000) and excluding irrelevant terms, acronyms, and proper nouns. Therefore, this study provides uniquely representative, large-scale word lists from leading journals, indicating the specific registers that sampled and the nomenclature that emerged from the dataset.

Findings of this study provide pedagogical implications for ESP/EAP end-users. Given the importance of research-informed instruction (Joseph-Richard et al., 2020), one of this study's major contributions is to help instructors and students of WM improve their discipline-specific vocabulary so they can interact with others in the discourse community more effectively. This method seeks to provide language instruction for specific, academic purposes with discipline-based evidence, rather than relying solely on the instructor's intuition about the most effective techniques and strategies, thereby enhancing the accuracy and efficacy of language teaching. As such, evidence-based methodologies allow educators to focus on students' linguistic needs, thereby enhancing the overall learning experience and understanding of specific language in designated contexts.

More precisely, this kind of research-driven instruction could accelerate the persistent gap that has been recently brought to light in the literature on the relationship between research and teaching. Teachers may use these word lists and specialized corpus to adopt a cutting-edge teaching strategy in light of the recent rise of data-driven learning (DDL). In addition, the learner-centered approach known as DDL makes it easier to uncover linguistic meanings and patterns. This method improves language learners' comprehension and deepens their knowledge, enabling them to develop a nuanced understanding of how language works across diverse contexts. It also encourages them to examine large samples of real-world language usage (Perez-Paredes et al., 2019).

Similar to other studies, this study is not without limitations. It could be noted that our investigation suffered from a conceptual limitation (see Saedi & Amini Farsani, 2025), and it could be noted that we had a narrow scope. More specifically, the generalizability of this list to other subfields of materials science (e.g., Biomaterials, Materials Chemistry, Metals and Alloys) might be limited, as we primarily targeted WM as one of the most renowned strands of materials science. Accordingly, future research could explore all of the strands of materials science to determine the proportion of the field-specific words, collocations, and acronyms or develop authentically customized word lists that have not yet been existed for newer engineering (sub)fields (e.g., Artificial Intelligence Engineering)

 

 

Disclosure statement

No potential conflict of interest was reported by the authors.

 

 

ORCID

Mohammad Hossein Afsharipoor

http://orcid.org/0009-0000-6989-6712

Taha Saedi Roudi

http://orcid.org/0009-0004-7557-5965

Seyed Ali Mousavi Mohammadi

http://orcid.org/0009-0006-1631-5682

Mohammad Amini Farsani

http://orcid.org/0000-0002-0249-1996

 

References

Amini Farsani, M., Afsharipoor, M. H., & Saedi Roudi, T. (2025). Applied linguistics and urban planning nexus: Developing an academic word list of urban planning using corpus linguistics approach. Journal of Foreign Language Research, 14(4), 697-715. https://doi.org/10.22059/jflr.2025.387680.1177
Akhatovna Fakhrutdinova, R., Aleksandrovna Kokurina, A., Almazovna Gayazova, A., Rifkatovich Zamaletdinov, R., & Fang, H. (2023). The method of analyzing the semantic structure of lexical units of the English language in order to increase the efficiency of the educational process. Journal of Research in Applied Linguistics, 14(3), 457–460.
Aluthman, E. S. (2017). Compiling an OPEC word list: A corpus-informed lexical analysis. International Journal of Applied Linguistics & English Literature, 6(2), 78–91. https://doi.org/10.7575/10.7575/aiac.ijalel.v.6n.2p.78
Anderson, J., Leaver, K. D., Rawlings, R. D., & Leevers, P. S. (2004). Materials science for engineers. CRC.
Anthony, L. (2021). AntWordProfler (1.5.1w). Tokyo, Japan: Waseda University.
Anthony, L. (2024). AntConc (Version 4.2.4) [Computer Software]. Tokyo, Japan: Waseda University. Available from https://www.laurenceanthony.net/software
Ashrafzadeh, A. (2021). Developing a corpus-based academic word list and collocation list in medicine. Journal of Language and Communication (JLC), 8(2), 239–300.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . .: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405. https://doi.org/10.1093/applin/25.3.371
Brezina, V., & Gablasova, B. (2015). Is there a core general vocabulary? Introducing the new general service list. Applied Linguistics, 36(1), 1–22.
Brown, D., Stoeckel, T., Mclean, S., & Stewart, J. (2020). The most appropriate lexical unit for L2 vocabulary research and pedagogy: A brief review of the evidence. Applied Linguistics, 43(3), 596–602. https://doi.org/10.1093/applin/amaa061
Browne, C. (2014). A new general service list: the better mousetrap we’ve been looking for?. Vocabulary Learning and Instruction, 3(2), 1–10.
Browne, C., Culligan, B., & Phillips, J. (2013). The new general service list: A core vocabulary for EFL students and teachers. JALTs The Language Teacher, 34(7), 13–15.
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. https://doi.org/10.3758/brm.41.4.977.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238. https://doi.org/10.2307/3587951
Coxhead, A., & Demecheleer, M. (2018). Investigating the technical vocabulary of Plumbing. English for Specific Purposes, 51, 84–97. https://doi.org/10.1016/j.esp.2018.03.006
Coxhead, A., & Nation, P. (2001). The specialised vocabulary of English for academic purposes. Research Perspectives on English for Academic Purposes, 252–267.
Chen, Q., & Ge, G. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles. English for Specific Purposes, 26(4), 502–514. https://doi.org/10.1016/j.esp.2007.04.003
Chen, H., & Lei, G. (2019). Developing a technical word list for research articles in computer science discipline. English Language Teaching12(10), 131–141. https://doi.org/10.5539/elt.v12n10p131
Dang, T. N. Y. (2019). Corpus-based word lists in second language vocabulary research, learning, and teaching. Routledge.
Durrant, P. (2014). Discipline and level specificity in university students’ written vocabulary. Applied Linguistics, 35(3), 328–356. https://doi.org/10.1093/applin/amt016
Durovic, Z., Vuković-Stamatović, M., & Vukičević, M. (2021). How much and what kind of vocabulary do marine engineers need for adequate comprehension of ship instruction books and manuals? Círculo De Lingüística Aplicada a La Comunicación, 88, 123–134. https://doi.org/10.5209/clac.78300
Fraser, S. (2007). Providing ESP learners with the vocabulary they need: Corpora and the creation of specialized word lists. Hiroshima Studies in Language and Language Education10(1), 127–143.
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327.
Gilmore, A., & Millar, N. (2018). The language of civil engineering research articles: A corpus-based approach. English for Specific Purposes, 51, 1–17. https://doi.org/10.1016/j.esp.2018.02.002
Heidari, F., Jalilifar, A., & Salimi, A. (2020). Developing a Corpus-Based Word List in Pharmacy Research‎ Articles: A Focus on Academic Culture. International Journal of Society, Culture & Language8(1), 1–15.
Hsu, W. (2013). Bridging the vocabulary gap for EFL medical undergraduates: The establishment of a medical word list. Language Teaching Research, 17(4), 454–484. 
Hsu, W. (2014). Measuring the vocabulary load of engineering textbooks for EFL undergraduates. English for Specific Purposes, 33, 54–65. https://doi.org/10.1016/j.esp.2013.07.001
Hutagalung, S. (2012). Materials science and technology. BoD.‏
Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”?. TESOL Quarterly41(2), 235–253.
It-Ngam, T., & Phoocharoensil, S. (2019). The development of science academic word list. Indonesian Journal of Applied Linguistics, 8(3), 657–667. https://doi.org/10.17509/ijal.v8i3.15269
Jamalzadeh, M., & Chalak, A. (2019). A corpus-based study of academic vocabulary in Physiotherapy research articles. Language Teaching Research Quarterly9(9), 69–82.
Joseph‐Richard, P., Almpanis, T., Wu, Q., & Jamil, M. G. (2021). Does research‐informed teaching transform academic practice? Revealing a RIT mindset through impact analysis. British Educational Research Journal47(1), 226–245.
Khamphairoh, T., & Tangpijaikul, M. (2012). Collocations of keywords found in insurance research articles: A corpus-based analysis. Journal of Studies in the Field of Humanities, 19(2), 166–188.
Khani, R., & Tazik, K. (2013). Towards the development of an academic word list for applied linguistics research articles. RELC Journal, 44(2), 209–232.
Khany, R., & Kalantari, B. (2021). Accounting academic word list (AAWL): A corpus-based study. Journal of Foreign Language Teaching and Translation Studies6(1). https://doi.org/10.22034/efl.2021.268643.1070
Korzin, A. S., Zhandarova, A. S., & Volkova, Y. A. (2023). Corpus-based approach to developing teaching materials for aerospace English. GEMA Online Journal of Language Studies, 23(3), 127–158. https://doi.org/10.17576/gema-2023-2303-08
Kruawong, T., & Phoocharoensil, S. (2021). Developing an English zoology academic word list: A corpus-based study. Thoughts, 2, 63–78. https://doi.org/10.58837/chula.thts.2020.2.3.
Laufer, B., & Nation, P. (1999). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–22.
Lippold, C. (2014). Welding metallurgy and weldability. John Wiley & Sons.
Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 22, 42–53. https://doi.org/10.1016/j.jeap.2016.01.008.
Liu, J., & Han, L. (2015). A corpus-based environmental academic word list building and its validity test. English for Specific Purposes, 39, 1–11.
Martinez, I. A., Beck, S., & Panza, C. B. (2009). Academic vocabulary in agricultural research articles: A corpus-based study. English for Specific Purposes, 28(3), 183–198. 
Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Specific Purposes25(2), 235–256.
Nation, P. (2001). Learning vocabulary in another language. Cambridge University Press.
Özer, M., & Akbas, E. (2024). Assembling a justified list of academic words in veterinary medicine: The veterinary medicine academic word list (VMAWL). English for Specific Purposes. 74, 29–43.
Pérez-Paredes, P., Guillamón, C. O., Van De Vyver, J., Meurice, A., Jiménez, P. A., Conole, G., & Hernández, P. S. (2019). Mobile data-driven language learning: Affordances and learners’ perception. System, 84, 145–159. https://doi.org/10.1016/j.system.2019.06.009
Pineau, A. & Quere, Y. (2011). The metallurgy, science and engineering. EDP Sciences.
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35, 655–687.
Rashidi, N. & Mazdayasna, G. (2016). Impact of genre-based instruction on development of students’ letter writing skills: The case of students of textile engineering. Journal of Research in Applied Linguistics7(2), 55–72. https://doi.org/10.22055/rals.2016.12094
Roesler, D. (2021). When a bug is not a bug: An introduction to the computer science academic vocabulary list. Journal of English for Academic Purposes, 54. https://doi.org/10.1016/j.jeap.2021.101044
Saedi, T., & Amini Farsani, M. (2025). Exploring limitations in mixed methods research (MMR): Domains, functions, and specificity. Journal of Mixed Methods Research. https://doi.org/10.1177/15586898251396757
Saeedi, M., Khany, R., & Tazik, K. (2023). Research themes and sub-themes in academic wordlist studies between 2000 and 2020: A systematic review. Journal of Research in Applied Linguistics, 14(1), 95–111. https://doi.org/10.22055/rals.2023.18070
Safari, M. (2019). English vocabulary for equine veterans: How different from GSL and AWL words. Iranian Journal of English for Academic Purposes, 8(2), 51–65.
Salazar, D. (2014). Lexical bundles in native and non-native scientific writing. Studies in corpus linguistics. John Benjamins Publishing Company. https://doi.org/10.1075/scl.65
Sun, Y., & Lan, G. (2023). A bibliometric analysis on L2 writing in the first 20 years of the 21st century: Research impacts and research trends. Journal of Second Language Writing, 59, 100963. https://doi.org/10.1016/j.jslw.2023.100963
Swales, J. M. (1990). Genre analysis. English in academic and research setting. Cambridge.
Todd, R. W. (2017). An opaque engineering word list: Which words should a teacher focus on? English for Specific Purposes, 45, 31–39. https://doi.org/10.1016/j.esp.2016.08.003
Thiruchelvam, S., Jin, N. Y., Tong, C. S., Ghazali, A., & Husin, N. B. M. (2018). The language of civil engineering: Corpus-based studies on vocational school textbooks in Malaysia. International Journal of Engineering & Technology, 7, 844–847.
Valipouri, L., & Nassaji, H. (2013). A corpus-based study of academic vocabulary in chemistry research articles. Journal of English for Academic Purposes, 12(4), 248–263.
Veenstra, J., & Sato, Y. (2018). Creating an institution-specific science and engineering academic word list for university students. Journal of Asia TEFL15(1), 148–161.
Vuković-Stamatović, M. (2024). Creating and validating a corpus-based English academic word list for physics. Spanish Journal of Applied Linguistics, 38(1). https://doi.org/10.1075/resla.22041.vuk
Wang, J., Liang, S. L., & Ge, G. C. (2008). Establishment of a medical academic word list. English for Specific Purposes27(4), 442–458.
West, M. (1953). A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology. Longman.
Xodabande, I., Atai, M. R., Hashemi, M. R., & Thompson, P. (2023). Developing and validating a mid-frequency word list for chemistry: A corpus-based approach using big data. Asian-Pacific Journal of Second and Foreign Language Education, 8(1). https://doi.org/10.1186/s40862-023-00205-5
Yang, M. (2014). A nursing academic word list. English for Specific Purposes, 37, 27–38.  https://doi.org/10.1016/j.esp.2014.05.003