Document Type : Research Paper
Authors
Iran University of Science and Technology
Abstract
Employing reliable evidence-based academic word lists has been a noticeable concern for numerous educators, students, and researchers in English for Academic/Specific Purposes (EAP/ESP) courses. Currently, many of these courses still lack research-oriented materials and rely heavily on traditional ways of teaching field-specific terminologies. This study aimed to create a specialized corpus to identify the most frequent academic words in Welding Metallurgy (WM) and to analyze the most prevalent three- and four-word lexical bundles (N-grams). We employed a corpus-based approach, and identified top-tier journals of WM; then, we analyzed the articles from 2017 to 2023 that followed the Introduction, Methodology, Results, and Discussion (IMRD) format. As such, 875 empirical research articles were compiled and analyzed to establish a specialized corpus with four million words. After applying word selection criteria, 608 lemmas were identified. Furthermore, we recognized 68 technical acronyms in the field and grouped them into an independent list. We also highlighted the most prevalent N-grams to explore the field's formulaic language. Consequently, 61 prevalent technical N-grams were recognized. As part of pedagogical implications, this study would deepen ESP course instructors’ knowledge and urges them to be more mindful of evidence-based material. It also encourages students to give more weight to their fundamental discipline-specific needs by incorporating authentic word lists in practice.
Keywords
- Corpus analysis
- English for Specific Purposes
- Welding Metallurgy Academic Word List
- N-grams
- Welding Metallurgy Acronyms
Main Subjects
INTRODUCTION
Globalization represents the integration of diverse cultures, languages, organizations, and countries from across the globe. This phenomenon has facilitated worldwide connectivity for both personal and business purposes and profoundly influenced the English language. Furthermore, the incremental rate of English language-speaking countries’ cooperation with other nations in terms of education, economy, culture and some other variables has led to the constant growth of interest in English (Akhatovna Fakhrutdinova et al., 2023). Also, English serves a global function by providing access to a wide range of academic resources, including scientific research and educational materials, tailored to meet the linguistic needs of students who are enrolled in English for Specific Purposes (ESP) or English for Academic Purposes (EAP) courses. As a result, this linguistic paradigm allows students, researchers, and instructors to engage with cutting-edge research, evidence-based findings, research-informed materials, and textbooks beyond their native language. Within EAP and ESP domains of applied linguistics (AL), English in the new era functions not merely as an educational tool but as a prerequisite to enter the academic community.
Additionally, most engineering students at the tertiary level must complete required ESP courses (Rashidi & Mazdayasna, 2016). These courses are taught at universities where subjective textbooks serve as the primary instructional material for students. Also, it is widely acknowledged that academic vocabulary plays a vital role in the literacy skills (i.e., reading and writing) of native and non-native language learners (Saeedi et al., 2022). A frequently mentioned prerequisite for entering academia is the development of academic word lists that represent the language and terminology of both general and specific disciplines. This requirement has been empirically demonstrated in studies aimed at creating general word lists, such as General Service List (GSL; West, 1953), which encompass vocabulary items frequently used in daily conversations, reading materials, and writing assignments.
Aside from that, Coxhead (2000) developed an Academic Word List (AWL) derived from research articles (RAs) across four scientific areas: commerce, law, arts, and science. The objective of these academic word lists was to assist users in meeting their linguistic needs in both daily life and academic contexts. In this regard, Coxhead (2000) recognized that AWL does not adequately meet the needs of students majoring in various scientific disciplines, as its coverage is not evenly distributed across these fields. For instance, Hyland and Tse (2007) stated that various lexical items exhibited distinct lexical behaviors across multiple scientific domains with respect to meaning, range, and collocation. This emphasizes the importance of creating field-specific word lists that are customized for scientific disciplines.
Academic word lists, which cater to the linguistic requirements of non-native English speakers, offer evidence-based academic terms across a diverse array of disciplines. Some of the studies surrounding this concept include investigations in Accounting (Khany & Kalantari, 2021), Chemistry (Valipouri & Nassaji, 2013), Physics (Vukovic-Stamatovic, 2024), and Veterinary Science (Özer & Akbas, 2024). Although several studies have been conducted to investigate the linguistic characteristics across various scientific domains, none have specifically addressed welding metallurgy (WM) and its linguistic features within RAs. Upon examining the calculated mean (x̄ = 2,098,650) and median (x̃ = 1,015,000) of the corpora size in previous studies of engineering, it becomes evident that several of the studies did not adequately represent the field due to the small corpus size (Durovic et al., 2021; Korzen et al., 2023; Thiruchelvam et al., 2018). That is, according to Brysbaert and New (2009), a corpus of one million running words is considered a reliable list of highly frequent words. Additionally, certain studies have gathered and incorporated localized materials to establish specialized corpora aimed at addressing the EAP/ESP language needs of specific communities (Mudraya, 2006).
Nearly all university engineering curricula have incorporated materials science as a critical component in recent years. Although it is true that physicists and chemists study materials from a scientific perspective, the rise of materials science is noteworthy because it can bring together many concepts from physics and chemistry into a single, comprehensive field (Anderson et al., 2004). Moreover, materials are important to human beings due to the advantages gained from manipulating their properties (Hutagalung, 2012). The importance of (welding) metallurgy both as a science and an engineering discipline (Pineau & Quere, 2011), its common characteristics with other engineering fields such as chemistry (Lippold, 2014), and the absence of evidence-based ESP/EAP textbooks in engineering disciplines has compelled us to consider developing authentic word lists for this field.
LITERATURE REVIEW
Nation (2001) distinguished four categories of vocabulary that are common in English academic writing: Academic, low-frequency, technical, and high-frequency words. High-frequency words are frequently used in reading materials, writing assignments, and casual conversations. GSL (West, 1953) is the most widely known list of high-frequency words. In addition, it is worth noting that many word lists have been created to help students learn general vocabulary for a broad range of their interests. However, low-frequency words, which are distinguished by their limited distribution and infrequent usage, create a significant portion of the vocabulary of any given field. Some of these words might be used rarely—possibly just once or twice. They are, however, the most numerous groups of words in the field. Relatively, about 5% of the vocabulary in academic texts is composed of low-frequency words, which include proper names, words that are rarely used in everyday speech, non-high-frequency words, and technical terms from other areas of science (Nation, 2001). As Nation (2001) pointed out, one person's technical vocabulary is another person's low-frequency words, showing how different low-frequency words can be.
Academic vocabulary is generally absent in basic general English texts but comprises a significant portion of the lexical units that are used in academic discourse. Students frequently encounter challenges in mastering these terms due to their relative unfamiliarity compared to the specialized vocabulary pertinent to their disciplines. In order to address this issue, AWL plays a vital role in helping students understand specialized vocabulary in different fields (Coxhead, 2000). This list consists of 570-word families that are not included in the 2,000 most commonly used English words of GSL. Thus, it functions as an essential educational resource for students with academic goals (Coxhead & Nation, 2001).
To gain a more nuanced understanding of these word lists, Liu and Han (2015) identified two categories: (1) general and (2) field-specific. The former represents vocabulary items that are associated with diverse fields, accessible and applicable by most ESP students as foundational knowledge for their academic pursuits, exemplified by Coxhead’s AWL (2000). Conversely, the latter subsumes terminologies that frequently occur across various subject domains within a specific discipline (Khani & Tazik, 2013; Martinez et al., 2009). Although they may be prevalent in a specific subject area, they are encountered less often in other contexts. More specifically, technical terminologies include a variety of categories, some of which are exclusive to particular fields of study (Nation, 2001).
Given the importance of discipline-specific and technical word lists and the inadequacy of AWL to address the linguistic requirements of EAP/ESP students across various disciplines, numerous studies have been conducted to identify the academic vocabulary items associated with different hard and soft sciences. After critically reviewing the literature, we found 28 studies that had investigated the lexical characteristics of fields within the hard sciences. In particular, nine studies have analyzed vocabulary patterns within engineering disciplines.
Aiming to construct a discipline-specific word list for chemistry, Valipouri and Nassaji (2013) developed a field-specific word list specifically designed for English as a Foreign Language (EFL) students in chemistry. The researchers randomly selected ten journals from each of the four areas of chemistry, and a total of 1,185 RAs, published between 2003 and 2009 and conforming to the IMRD format (Swales, 1990), were included. Doing so, they established a corpus of four million words in Chemistry. Following Coxhead’s (2000) criteria, this study identified 1,577-word families that met the established criteria for frequency, range, and specialized occurrence. Furthermore, the researchers also included GSL items. They then eliminated function words, technical terms, and abbreviations from the original compilation in order to improve the word list. The result was a list of 1,400-word families of technical chemistry vocabulary, which included 327-AWL, 390-non-GSL/AWL, and 683-GSL-word families. In short, the constructed AWL attained a thorough coverage of 81.18% within the 4-million-word corpus of chemistry RAs.
As two of the most exemplary corpus studies, Mudraya (2006) and Ward (2009) rendered specialized word lists for EAP/ESP engineering students. In specific, Mudraya identified thirteen engineering textbooks that encompassed essential subjects for all engineering students, such as Technical Mechanics, Engineering Materials, and Mechanics of Materials, among others. Hence, the specialized corpus of engineering was established with approximately two million running words, representing 1,200-word families and 9,000-word types commonly utilized throughout the corpus. To construct the word list, the word families had to occur at least 100 times across the whole corpus. After applying the word selection criteria, the wordlist was constructed with 1,260-word families. The established corpus was compared with COBUILD, the Bank of English Corpus, and the BNC, and the relationship among the 50 most prevalent closed-class word forms was found to be statistically significant.
As a follow-up study, Ward analyzed linguistic features in engineering and constructed a specialized word list to address the linguistic needs of EAP students. To achieve this, Ward consulted with instructors from five engineering disciplines: chemical, civil, electrical, industrial, and mechanical. A total of twenty-five textbooks were assembled, and random pages were selected until the word count reached 10,000. Accordingly, he established a corpus of 271,000 words, identified 10,290 distinct word types to develop a foundational word list comprising 229 words, and designed to support English language acquisition for beginners in various engineering fields. Ward created a corpus, called Basic Engineering List (BEL), covering 17.2%, 15.6%, and 21% of the sub-corpora. In comparison to a text centered on mass transfer in some of the engineering sub-fields (e.g., mechanical), BEL demonstrated a coverage of 17.7%. This observation indicates a reliable, significant level of coverage across BEL's various technical materials. Moreover, although AWL contained a considerably broader vocabulary than BEL, it provided only 11.3% coverage of engineering content. This comparison underscores the effectiveness of BEL in delivering comprehensive coverage of engineering terminology, even though it has a more limited scope than AWL. Table 1 presents a comprehensive list of studies that have developed distinctive wordlists in the hard sciences.
Table 1. An Overview of the Developed Wordlists across Hard Sciences
|
Disciplines and Fields of study |
Author(s) |
|
Medical Science |
Wang et al. (2008) |
|
Engineering |
Mudraya (2006) |
|
Science, Social Sciences and Engineering |
Hyland & Tse (2007) |
|
Medical Science |
Chen & Ge (2007) |
|
Pharmacology |
Fraser (2007) |
|
Engineering |
Ward (2009) |
|
Medical Science |
Hsu (2013) |
|
Chemistry |
Valipouri & Nassaji (2013) |
|
Earth Remote Sensing (Aerospace) |
Korzen et al. (2023) |
|
Nursing |
Yang (2015) |
|
Oil marketing and Oil industry |
Ebtisan Saleh Aluthman (2017) |
|
Medical Science |
Lei & Liu (2016) |
|
Engineering |
Todd (2017) |
|
Plumbing |
Coxhead & Demecheleer (2018) |
|
Science & Engineering |
Veenstra & Sato (2018) |
|
Civil Engineering |
Gilmore & Millar (2018) |
|
Physiotherapy |
Jamalzadeh & Chalak (2019) |
|
Science |
It-ngam & Phoocharoensil (2019) |
|
Computer Science |
Chen & Lei (2019) |
|
Veterans Equine Therapy |
Safari (2019) |
|
Zoology |
Kruawong & Phoocharoensil (2020) |
|
Pharmacology |
Heidari et al. (2020) |
|
Medical Science |
Ashrafzadeh (2021) |
|
Computer Science |
Roesler (2021) |
|
Marine Engineering |
Durovic et al. (2021) |
|
Physics |
Vukovic-Stamatovic (2024) |
|
Veterinary Medicine |
Özer & Akbas (2024) |
|
Chemistry |
Xodabandeh et al. (2023) |
|
Urban Planning |
Amini Farsani et al. (2025) |
The previously exemplified studies created comprehensively custom-tailored corpora in engineering sub-fields, where they prove to be more beneficial for EAP/ESP students when referring to specialized word lists in comparison to AWL. However, in association with an academic and technical word list, no study has been conducted to establish a corpus depicting academic words and technical acronyms of WM, along with their most frequent multiword constructions. In brief, our academic word list will not only encompass the vocabulary of WM but can also be extrapolated to other engineering and related fields. Conversely, our technical word list would be field-specific and, in a stricter sense, more representative of WM.
PURPOSE OF THE STUDY
One aim of this study is to focus on WM inclusively, resulting in more robust and reliable outcomes for the discipline. Hence, we incorporated empirical research articles from leading journals in the field to ensure the generalizability and representativeness of the findings. At first sight, we intended to create a field-specific and an academic word list for WM and identified its most common lexical bundles. In addition, Biber et al. (2004) defined lexical bundles as the most common lexical sequences in a given register, also known as fixed expressions or N-grams. These constructions are frequently used in scientific writing and are essential for producing texts that adhere to the rhetorical conventions of any given fields of study (Salazar, 2014). Thus, we also aimed at generating a concrete list of multiword terms (N-grams) for WM. This study was guided by the following research questions:
- What are the most frequent academic words of Welding Metallurgy?
- To what extent do GSL and AWL cover the entire corpus of Welding Metallurgy Research Articles?
- Which of the General Word Lists and General Academic Word Lists (GSL, AWL, NGSL, and NAWL) are more beneficial for metallurgy students?
- What are the most frequent lexical bundles (N-grams of three and four) of Welding Metallurgy across the whole corpus?
METHODS
To establish and identify the domain of corpus, this investigation adhered to the benchmarks of Plonsky (2013), which include content (i.e., WM excerpts), location (i.e., WM journals and RAs), and time (i.e., publication date). To do so, we initially consulted five experts in metallurgy to recommend the top ten journals in the discipline. Subsequently, seven of the most recommended journals were selected and investigated through Scientific Journal Rankings (SJR) to verify the eminence of these journals (see Table 2). The rationale for the inclusion of top-tier journals was to represent authentic language, as published papers in these journals undergo rigorous proofreading, peer review, and editorial processes. In order to establish a representative corpus across various years, we incorporated empirical RAs that adhered to the Introduction, Methodology, Results, and Discussion (IMRD) format (Swales, 1990) and those that were published between 2017 and 2023.
Additionally, we attempted to ensure that the journals of WM were representative and balanced in the corpus. The corpus was established by including the same number of research articles per year (n = 25) from each journal. It is evident that the inclusion of articles is not consistent across time. This is because certain journals, such as the Welding Journal and the Journal of Advanced Joining Processes only publish a limited number of research articles each year for each volume. Additionally, the journals' primary focus is on WM, as evidenced in journals such as Journal of Materials Science and Engineering: A, Science and Technology of Welding and Joining, and Journal of Materials Research and Technology. For this reason, a specialized corpus of WM was developed, comprising nearly 4 million running words.
Table 2: Journals' Information
|
Time Span |
Journal |
|
2018-2022 |
Acta Materialia |
|
2017-2021 |
Journal of Advanced Joining Processes |
|
2019-2023 |
Journal of Materials Research and Technology (JMRT) |
|
2017-2021 |
Materials Science and Engineering: A |
|
2017-2021 |
Science and Technology of Welding and Joining |
|
2018-2022 |
Welding in the World |
|
2018-2021, 2023 |
Welding Journal |
Corpus Establishment
To create the corpus, all RAs were standardized, meaning all references, footnotes, tables, figures, and appendices were omitted. Subsequently, they were converted to TXT files, each representing a separate journal, and inserted into AntWordProfiler 1.5.1 (Anthony, 2021). As the first stage of analyzing the data using this concordancer, we applied two criteria: (1) specialized occurrence and (2) range. We also prioritized range over frequency for its representativeness. By exporting the results to Excel files, we applied the word selection criteria and removed proper nouns and terms irrelevant to WM.
Word Selection Criteria
We adhered to the criteria outlined by Coxhead (2000), including specialized occurrence, range, and frequency. To ascertain the soundness of technical vocabulary, we needed to extend our analysis beyond the GSL and AWL. In this way, we attentively obtained these items in relation to the resemblance of their lexical characteristics with New General Service List (NGSL) and New Academic Word List (NAWL) as well. As mentioned earlier, the range was also prioritized over the frequency to prevent bias arising from journal word length and topic-related words. Vocabulary items needed to appear 28.57 times per million words and in half or more of the journals to fulfill the frequency and range requirements, respectively. Therefore, to create the field-specific word list, we incorporated technical terms that occurred at least 114 times across the entire corpus (i.e., frequency) and in four journals (i.e., range). Besides, we chose lemma—a headword along with its inflected forms— as the count unit since it is known to be highly efficient for students' literacy skills (e.g., reading and writing) within academic discourse (Dang, 2019; Durrant, 2014) and for its pedagogical advantages (Brown et al., 2020; Gardner & Davies, 2014; Lei & Liu, 2016).
N-Gram Identification
The N-gram tool in AntConc 4.2.4 (Anthony, 2024) was employed to identify the most frequently occurring N-grams (trigrams and four-grams in this study) and compile a list that served as a representative of the field. The employed criteria for recognizing tri- and four-grams throughout the entire corpus were range and frequency, both utilizing a cut-off point for identifying discipline-specific word lists— a range of four and a frequency of 114. According to Lei and Liu (2016), determining the cut-off point for including N-grams and the extent of N-grams' significance is a subjective decision that strongly depends on the researcher's judgement. Accordingly, first, a preliminary list of the most frequent N-grams was generated by applying the N-gram selection criteria; then, the third author, whose area of interest is mainly WM, explored the data and singled out the most frequent and important N-grams in the field. During this process, N-grams that either began or ended with a function word were excluded from the dataset (Sun & Lan, 2023).
Validity and Reliability
Our first step was to ensure that the journals are as representative as possible. Therefore, not only did we seek other experts’ help for the selection of journals, but we also referred to SJR. As another step, we cleaned the texts and lemmatized the words to have a valid analysis of frequency, range, and specialized occurrence. Furthermore, we compared our word list with those developed by others for various engineering fields. We also evaluated the coverage of our list against (N)GSL, (N)AWL, and Non-(N)GSL-(N)AWL (coverage test) as fundamental criteria to validate this work (see results and discussion).
We achieved reliability through a comprehensive procedure of peer-checking as well as consulting two experts in metallurgical science (the third author and a full professor). After compiling the texts, the first and second authors meticulously checked the excerpts for any errors or mismatches. Subsequently, after the general corpus was established, these two authors randomly surveyed the words for relevance. After creating the specialized word list containing academic, technical, and n-grams, all of the authors held several meetings to verify that they were concretely representative. This resulted in achieving a near-perfect inter-rater reliability agreement of 93% (κ = 0.90). Other than that, although the third author is a WM expert, to enhance trustworthiness, we consulted two professors specializing in metallurgy from different universities and discussed the accuracy and representativeness of the wording.
RESULTS
Academic Word List
To identify the most frequently occurring unique words of the field (Research Question 1), we adhered to the word selection criteria of Coxhead (2000), which encompassed range, frequency, and specialized occurrence. As Table 3 (see Appendix for the full list) shows, we found 608 lemmas that were prevalent in WM RAs. As the words indicate, a notable number of them are characteristic of the features of welding and clearly related to metallurgy and materials science, in general.
Table 3. The Most Prevalent Academic Words for Welding Metallurgy
|
Lemma |
Range |
Freq |
|
Welding |
7 |
27541 |
|
Zone |
7 |
7635 |
|
Tensile |
7 |
7143 |
|
Interface |
7 |
7010 |
|
Alloy |
7 |
6701 |
|
Arc |
7 |
6265 |
|
Laser |
7 |
5899 |
|
Strain |
7 |
5892 |
|
Microstructure |
7 |
5736 |
|
FSW |
7 |
5572 |
|
Friction |
7 |
4813 |
|
Fracture |
7 |
4736 |
|
Deformation |
7 |
4414 |
|
Shear |
7 |
4203 |
|
Thermal |
7 |
3985 |
|
HAZ |
7 |
3962 |
|
Residual |
7 |
3857 |
|
Fatigue |
7 |
3434 |
|
Plastic |
7 |
3331 |
|
Specimen |
7 |
3315 |
Coverage of Academic Word List
In disclosing the coverage rate of GSL and AWL with respect to our developed word list, we used AntWordProfiler 1.5.1 (Anthony, 2021) and imported the output in an Excel file. As illustrated in Table 4, GSL (West, 1953) comprises 2,682,109 tokens and 4,494-word types, accounting for 68.29% of the entire corpus. The results suggest that GSL plays a critical role in enhancing the reading and comprehension of WM RAs, as it encompasses a significant portion of the corpus. The initial list of GSL (West, 1953) covers 61.48% of the corpus, whereas the second list covers only 6.81%. Furthermore, AWL (Coxhead, 2000) constitutes 9.16% of the entire corpus, surpassing the second list of GSL (West, 1953). This supports the idea that adhering to a rigid sequence in learning word lists—such as mastering GSL before AWL and subsequently other specialized word lists—is not essential for achieving a solid comprehension of academic texts (Valipouri & Nassaji, 2013).
Table 4. GSL and AWL Coverage across the Corpus
|
TYPE% |
TYPE |
TOKEN% |
TOKEN |
Word Lists |
|
7.46 |
2787 |
61.48 |
2414793 |
1st level of GSL |
|
4.57 |
1707 |
6.81 |
267316 |
2nd level of GSL |
|
5.92 |
2213 |
9.16 |
359947 |
AWL |
|
82.05 |
30663 |
22.55 |
885739 |
Non-GSL-AWL |
|
100 |
37370 |
100 |
3927795 |
Total |
Coxhead’s AWL (2000) encompasses 2,213-word types, which accounts for 359,947 tokens and cover 9.16% of the entire corpus. Conversely, there are 30,663-word types that are not included in either of these lists— including 885,739 tokens and representing 22.55% of the entire corpus. Following a thorough refinement of the above-mentioned list, the WM AWL was constructed with 608 lemmas. Given that the Non-GSL-AWL encompasses 22.55% of the WM RAs and the AWL's 9.16% coverage, it can be concluded that the Non-GSL-AWL, which developed into the WM AWL, tends to offer great efficiency regarding the comprehension of academic information if it is co-used with GSL and AWL.
Furthermore, the Non-GSL-AWL in our corpus offered greater coverage than Coxhead's AWL. This phenomenon underscores the critical importance of creating field-specific word lists, supporting the theory that suggests distinct lexical items exhibit distinct behaviors in terms of frequency, collocation, and meaning across a broad range of domains (Hyland & Tse, 2007). Laufer and Nation (1999) contend that in order to effectively understand academic RAs, second language (L2) readers should strive to be acquainted with approximately 95% (approximately 3,000 words) of the vocabulary contained within these texts. GSL and AWL accounted for nearly 77.45% of the corpus, suggesting that the academic community in WM may have difficulty understanding and interacting with these texts. Hence, WM AWL plays a pivotal role in achieving fluency and comprehension of technical information.
However, one should not neglect the shortcomings of GSL and AWL noted in previous studies (Brezina & Gablasova, 2015; Gardner & Davies, 2014). In this sense, we ran our specialized corpus against two newer general and generic academic word lists. As shown in Table 5, NGSL (Browne, 2014; Browne et al., 2013)— a more recent developed general word list than GSL, includes 2,270-word types, represents 2,241,385 tokens, and accounts for 57.06% of the entire corpus. Furthermore, NAWL (Browne, 2014; Browne et al., 2013), which was developed as a newer version of AWL, accounts for 3.97% of the entire corpus, comprising 155,961 tokens and 102-word types. Concomitantly, 38.96% of vocabularies appeared in neither of NGSL nor NAWL. Consequently, our results suggest that while GSL and AWL are older than NGSL and NAWL, they provide more coverage across our specialized corpus, thereby serving better to tailor the linguistic needs of ESP students of WM.
Table 5. NGSL and NAWL Coverage across the Corpus
|
TYPE% |
TYPE |
TOKEN% |
TOKEN |
Word Lists |
|
6.07 |
2270 |
57.07 |
2241385 |
NGSL |
|
1.88 |
702 |
3.97 |
155961 |
NAWL |
|
92.05 |
34398 |
38.96 |
1530449 |
Non-NGSL-NAWL |
|
100 |
37370 |
100 |
3927795 |
Total |
Technical Acronyms and N-grams
After developing the WM AWL, the acronyms that met the technical nomenclature of the field were included and grouped into an independent list; thus, 68 technical acronyms were identified. To provide a clearer view of the technical acronyms, a selection of the technical acronyms that have been identified in the field are illustrated in Table 6 (For the full list, see Appendix).
Table 6. Technical Acronyms of the Field
|
Technical Acronym |
Full Form of the Acronym |
|
AF |
Acicular Ferrite |
|
FEM |
Finite Element Method |
|
EDX |
Energy Dispersive X-Ray Spectroscopy |
|
SIG |
Stud Inert Gas |
|
EBW |
Electron Beam Welding |
|
DT |
Digital Twin |
|
MAG |
Metal Active Gas |
|
IPF |
Interstitial Pulmonary Fibrosis |
|
PC |
Horizontal Welding Position |
|
CT |
Computed Tomography |
|
RT |
Radiographic Testing |
|
HF |
High Frequency |
|
CF |
Corrosion Fatigue |
|
HCP |
Hexagonal Close Packed |
|
FSP |
Friction Stir Processing |
|
FSSW |
Friction Stir Spot Welding |
|
LBW |
Laser Beam Welding |
|
OB |
Oxygen Blowing |
|
CMT |
Cold Metal Transfer |
In addition, to examine the formulaic language of WM, we analyzed the most frequent N-grams that met the word selection criteria and the technical nature of this field. This process led to the identification of 61 N-grams (see Table 7).
Table 7. Technical N-grams of the Field
|
Type (Trigrams) |
Freq |
Range |
Type (Four-grams) |
Freq |
Range |
|
across the weld |
241 |
7 |
during the welding process |
288 |
7 |
|
Heat affected zone |
300 |
7 |
friction stir welding (FSW) |
293 |
7 |
|
along the weld |
197 |
7 |
gas metal arc welding |
140 |
7 |
|
angle grain boundaries |
144 |
7 |
gas tungsten arc welding |
158 |
7 |
|
austenitic stainless steel |
158 |
7 |
heat affected zone (HAZ) |
280 |
7 |
|
average grain size |
309 |
7 |
microstructure and mechanical properties |
249 |
7 |
|
bead on plate |
205 |
7 |
post weld heat treatment |
146 |
7 |
|
during the process |
206 |
7 |
properties of the joint |
118 |
7 |
|
during the welding |
475 |
7 |
properties of the weld |
115 |
7 |
|
electron beam welding |
201 |
7 |
strength of the joint |
178 |
7 |
|
Scanning electron microscopy |
114 |
7 |
friction stir spot welding |
116 |
6 |
|
friction stir welding |
751 |
7 |
during the welding process |
288 |
7 |
|
gas metal arc |
201 |
7 |
friction stir welding (FSW) |
293 |
7 |
|
gas tungsten arc |
236 |
7 |
gas metal arc welding |
140 |
7 |
|
heating and cooling |
114 |
7 |
gas tungsten arc welding |
158 |
7 |
|
high heat input |
142 |
7 |
heat affected zone (HAZ) |
280 |
7 |
|
high strength steel |
165 |
7 |
microstructure and mechanical properties |
249 |
7 |
|
lap shear strength |
141 |
7 |
post weld heat treatment |
146 |
7 |
|
laser beam welding |
154 |
7 |
properties of the joint |
118 |
7 |
|
low carbon steel |
143 |
7 |
properties of the weld |
115 |
7 |
|
low heat input |
175 |
7 |
strength of the joint |
178 |
7 |
|
metal arc welding |
202 |
7 |
friction stir spot welding |
116 |
6 |
|
microstructure and mechanical |
304 |
7 |
during the welding process |
288 |
7 |
|
near the interface |
117 |
7 |
|
|
|
|
near the weld |
160 |
7 |
|
|
|
|
post weld heat |
194 |
7 |
|
|
|
|
resistance spot welding |
288 |
7 |
|
|
|
|
scanning electron microscope |
185 |
7 |
|
|
|
|
scanning electron microscopy |
188 |
7 |
|
|
|
|
severe plastic deformation |
222 |
7 |
|
|
|
|
solid state joining |
138 |
7 |
|
|
|
|
solid state welding |
126 |
7 |
|
|
|
|
Friction stir welding |
305 |
7 |
|
|
|
|
stir zone (SZ) |
126 |
7 |
|
|
|
|
strength and ductility |
118 |
7 |
|
|
|
|
stress strain curves |
147 |
7 |
|
|
|
|
tool rotation speed |
129 |
7 |
|
|
|
|
top and bottom |
143 |
7 |
|
|
|
|
tungsten arc welding |
170 |
7 |
|
|
|
|
ultimate tensile strength |
252 |
7 |
|
|
|
|
upper and lower |
220 |
7 |
|
|
|
|
weld cross section |
124 |
7 |
|
|
|
|
weld heat treatment |
155 |
7 |
|
|
|
|
x-ray diffraction |
146 |
7 |
|
|
|
|
affected zone (az) |
131 |
6 |
|
|
|
|
friction stir processing |
144 |
6 |
|
|
|
|
friction stir spot |
147 |
6 |
|
|
|
|
friction stir welded |
291 |
6 |
|
|
|
|
friction stir welds |
147 |
6 |
|
|
|
|
high speed camera |
129 |
6 |
|
|
|
From a broader view, both technical and n-gram word lists move beyond the single words in the academic word list, which would make the results more context- and field-specific. They will also operate as dynamic supplements for experts and students to widen their scope and move past the rigidity of single word units.
DISCUSSION
RQ1: What are the Most Frequent Academic Words of Welding Metallurgy?
This study offers an academic word list of WM terms drawn from prestigious journals in the field, yielding results that are more reliable and representative. Our findings corroborate the work of various scholars who have created field-specific word lists for engineering (Coxhead & Demecheleer, 2018; Durovic et al., 2021; Gilmore & Millar, 2018; Korzen et al., 2023; Thiruchelvam et al., 2018), emphasizing the development of discipline-specific word lists for engineering rather than generic academic word lists (Hsu, 2014; Mudraya, 2006; Todd, 2017; Ward, 2009). Our study closely resembles that conducted by Gilmore and Millar (2018) in terms of the included genre since both studies focus on RAs. That is, Gilmore and Millar (2018) conducted a study to analyze the lexical profile of the most frequently used academic terms in Civil Engineering. More specifically, they created a specialized corpus for this discipline encompassing 11 sub-disciplines and comprising 8 million running words. Subsequently, they compared the corpus with external corpora, including COCA, NGSL (Browne, 2014; Browne et al., 2013), and NAWL (Browne, 2014; Browne et al., 2013), to ensure the distinctiveness of the terminology related to civil engineering. They applied two criteria: dispersion and keyness. As a result, they found 2,967 keywords in the field.
RQ2: To What Extent Do GSL and AWL Cover the Entire Corpus of Welding Metallurgy RAs?
Compared with other established specialized corpora in engineering, it is evident that GSL, AWL, and Non-GSL-AWL exhibit distinct characteristics. Therefore, researchers are encouraged to create word lists specific to their respective disciplines and genres, as evidenced by the coverage percentages across various corpora of the same genre. In the current study, across the entire corpus, the proportions of GSL, AWL, and Non-GSL-AWL were 68.29%, 9.16%, and 22.55%, respectively. This distribution underscores the noticeable reliance on general vocabulary within the field, which is indicative of the interdisciplinary nature of metallurgy, where fundamental concepts and terminology are frequently employed.
Moreover, in WM, the proportion of Non-GSL-AWL words indicates the distinct academic and field-specific jargon necessary for accurate communication, while the significant presence of AWL terms highlights the field's academic rigor and specialized knowledge. Welding metallurgy, in particular, requires an in-depth knowledge of thermal processes, phase transformations, and material properties; hence, a specialized vocabulary is needed to accurately convey these intricate concepts. As our findings highlight, in order to effectively assist students and practitioners in mastering the complex aspects of WM, it is crucial to strike a balance between general academic language and field-specific terms. A cross-comparison of word list coverage, as documented in various studies of hard sciences, primarily general and specific fields of engineering, is presented in Table 9. This comparison illustrates the distinct lexical behavior of the fields. Our findings corroborate the literature, indicating that an academic wordlist is not equally effective across various fields, including distinct sub-disciplines within a specific field, such as engineering.
Table 9. Word Lists Coverage across Different Corpora
|
Word list |
Marine Engineering (Durovic et al.) |
Chemistry (Valipouri & Nassaji) |
Science and Engineering (Veenstra et al.) |
Basic Engineering (Ward) |
Current Study |
|
GSL |
71.39% |
65.46% |
72.3% |
72.3% |
68.29% |
|
AWL |
8.07% |
9.96% |
14.3% |
11.3% |
9.16% |
|
Non-GSL-AWL |
20.54% |
24.57% |
13.4% |
16.4% |
22.55% |
RQ3: Which of the General Word Lists and General Academic Word Lists (GSL, AWL, NGSL, and NAWL) are More Beneficial for Welding Metallurgy Students?
Generally, the purpose of mastering general and academic word lists is to enhance one's proficiency in reading and understanding of field-specific texts. The General Word Lists (i.e., GSL and NGSL) aim to provide students with the most frequently used general vocabulary from various corpora, whereas the General AWLs are specifically designed to highlight the most common academic terminology found in scientific literature (i.e., AWL and NAWL). It is noteworthy that the GSL is frequently criticized as an obsolete word list among general and AWLs. Our findings indicate that although GSL was established in 1953, it continues to surpass recently developed wordlists in corpus coverage (see Figure 1). Thus, it is highly beneficial and efficient for assisting scientific communities in reading and comprehending academic and technical texts. The same holds for the AWL of Coxhead (2000), which provided more coverage across the corpus than the recently developed general AWL (Browne, 2014; Browne et al., 2013).
Figure 1. Extent of Coverage of Word Lists across the Corpus
RQ4: What are the Top Trigrams and Four-grams of Welding Metallurgy?
In addition to reporting the most prevalent academic and technical words of WM, we also identified the most prevalent lexical bundles of three and four across the entire corpus. The significance of identifying trigrams and four-grams is that, as common features of scientific writing, they play a pivotal role in creating texts that adhere to the rhetorical conventions of a particular research field (Salazar, 2014). Hence, the top trigrams and four-grams of the field across the entire corpus were identified. Parallel to that, Khamphairoh and Tangpijaikul (2012) conducted a corpus-based study to identify technical vocabulary within the insurance sector and identified the top twenty N-grams in the field. Furthermore, Gilmore and Millar (2018) identified the most frequently occurring word sequences, ranging from three to six words. Moreover, they generated a list of prevalent lexical bundles, encompassing 366 to 1,138 phrases. Our study is primarily comparable to that of Gilmore and Millar (2018) because we incorporated all of the frequent word sequences of the specialized corpus. However, it differs from that of Khamphairoh and Tangpijaikul (2012) in that they only represented the top 20 N-grams of the study.
CONCLUSION AND IMPLICATIONS
We conducted this study to identify key academic terms in the field and assess the level of support that GSL and AWL offer to the WM community, with the goals of filling the gap in a specialized corpus of this field and developing a trustworthy word list. With that in mind, an academic word list for WM with 608 lemmas, a technical word list with 68 terms, and a multiword expression list with 61 N-grams were created by applying word selection criteria (Coxhead, 2000) and excluding irrelevant terms, acronyms, and proper nouns. Therefore, this study provides uniquely representative, large-scale word lists from leading journals, indicating the specific registers that sampled and the nomenclature that emerged from the dataset.
Findings of this study provide pedagogical implications for ESP/EAP end-users. Given the importance of research-informed instruction (Joseph-Richard et al., 2020), one of this study's major contributions is to help instructors and students of WM improve their discipline-specific vocabulary so they can interact with others in the discourse community more effectively. This method seeks to provide language instruction for specific, academic purposes with discipline-based evidence, rather than relying solely on the instructor's intuition about the most effective techniques and strategies, thereby enhancing the accuracy and efficacy of language teaching. As such, evidence-based methodologies allow educators to focus on students' linguistic needs, thereby enhancing the overall learning experience and understanding of specific language in designated contexts.
More precisely, this kind of research-driven instruction could accelerate the persistent gap that has been recently brought to light in the literature on the relationship between research and teaching. Teachers may use these word lists and specialized corpus to adopt a cutting-edge teaching strategy in light of the recent rise of data-driven learning (DDL). In addition, the learner-centered approach known as DDL makes it easier to uncover linguistic meanings and patterns. This method improves language learners' comprehension and deepens their knowledge, enabling them to develop a nuanced understanding of how language works across diverse contexts. It also encourages them to examine large samples of real-world language usage (Perez-Paredes et al., 2019).
Similar to other studies, this study is not without limitations. It could be noted that our investigation suffered from a conceptual limitation (see Saedi & Amini Farsani, 2025), and it could be noted that we had a narrow scope. More specifically, the generalizability of this list to other subfields of materials science (e.g., Biomaterials, Materials Chemistry, Metals and Alloys) might be limited, as we primarily targeted WM as one of the most renowned strands of materials science. Accordingly, future research could explore all of the strands of materials science to determine the proportion of the field-specific words, collocations, and acronyms or develop authentically customized word lists that have not yet been existed for newer engineering (sub)fields (e.g., Artificial Intelligence Engineering)
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
|
Mohammad Hossein Afsharipoor |
||
|
Taha Saedi Roudi |
||
|
Seyed Ali Mousavi Mohammadi |
||
|
Mohammad Amini Farsani |