英语巴士网

What Does Language Testing Have to Offer?

分类: 英语学习方法 

University of California, Los Angeles

Advances in language testing in the past decade have occurred in three areas: (a) the development of a theoretical view that considers language ability to be multicomponential and recognizes the influence of the test method and test taker characteristics on test performance, (b) applications of more sophisticated measurement and statistical tools, and (c) the development of “communicative” language tests that incorporate principles of “communicative” language teaching. After reviewing these advances, this paper describes an interfactional model of language test performance that includes two components, language ability and test method. Language ability consists of language knowledge and metacognitive strategies, whereas test method includes characteristics of the environment, rubric, input, expected response, and relationship between input and expected response. Two aspects of authenticity are derived from this model. The situational authenticity of a given test task depends on the relationship between its test method characteristics and the features of a specific language use situation, while its interfactional authenticity pertains to the deGREe to which it invokes the test taker’s language ability. The application of this definition of authenticity to test development is discussed.

Since 1989, four papers reviewing the state of the art in the field of language testing have appeared (Alderson, 1991; Bachman, 1990a; Skehan, 1988, 1989, 1991). All four have argued that language testing has come of age as a discipline in its own right within applied linguistics and have presented substantial evidence, I believe, in support of this assertion. A common theme in all these articles is that the field of language testing has much to offer in terms of theoretical, methodological, and practical accomplishments to its sister disciplines in applied linguistics. Since these papers provide excellent critical surveys and discussions of the field of language testing, I will simply summarize some of the common themes in these reviews in Part 1 of this paper in order to whet the appetite of readers who may be interested in knowing what are the issues and problems of current interest to language testers. These articles are nontechnical and accessible to those who are not themselves language testing specialists. Furthermore, Skehan (1991) and Alderson (1991) appear in collections of papers from recent confer-ences that focus on current issues in language testing. These collections include a wide variety of topics of current interest within language testing, discussed from many perspectives, and thus constitute major contributions to the literature on language testing.

the purpose of this paper is to address a question that is, I believe, implicit in all of the review articles mentioned above, What does language testing have to offer to researchers and practitioners in other areas of applied linguistics, particularly in language learning and language teaching? These reviews discuss several specific areas in which valuable contributions can be expected (e.g., program evaluation, second language acquisition, classroom learning, research methodology). Part 2 of this paper focuses on two recent developments in language testing, discussing their potential contributions to language learning and language teaching. I argue first that a theoretical model of second language ability that has emerged on the basis of research in language testing can be useful for both researchers and practitioners in language learning and language teaching. Specifically, I believe it provides a basis for both conceptualizing second language abilities whose acquisition is the object of considerable research and instructional effort, and for designing language tests for use both in instructional settings and for research in language learning and language teaching. Second, I will describe an approach to characterize the authenticity of a language task which I believe can help us to better understand the nature of the tasks we set, either for students in instructional programs or for subjects in language learning research and which can thus aid in the design and development of tasks that are more useful for these purposes.

PART 1: LANGUAGE TESTING IN the 1990s

In echoing Alderson’s (1991) title, I acknowledge the commonal-ities among the review articles mentioned above in the themes they discuss and the issues they raise. While each review emphasizes specific areas, all approach the task with essentially the same rhetorical organization: a review of the achievements in language testing, or lack thereof, over the past decade; a discussion of areas of likely continued development; and suggestions of areas in need of increased emphasis to assure developments in the future. Both Alderson and Skehan argue that while language testing has made proGREss in some areas, on the whole “there has been relatively little progress in language testing until recently” (Skehan, 1991, p. 3). Skehan discusses the contextual factors—theory, practical consider-ations, and human considerations—that have influenced language testing in terms of whether these factors act as “forces for conserva-tism” or “forces for change” (p. 3). The former, he argues, “all have the consequence of retarding change, reducing openness, and gen-erally justifying inaction in testing” (p. 3), while the latter are “pres-sures which are likely to bring about more beneficial outcomes” (p. 7). All of the reviews present essentially optimistic views of where language testing is going and what it has to offer other areas of applied linguistics. I will group the common themes of these reviews into the general areas of (a) theoretical issues and their im-plications for practical application, (b) methodological advances, and (c) language test development.#p#分页标题#e#

theORETICAL ISSUES

One of the major preoccupations of language testers in the past decade has been investigating the nature of language proficiency. In 1980 the “unitary competence hypothesis” (Oller, 1979), which claimed that language proficiency consists of a single, global ability was widely accepted. By 1983 this view of language proficiency had been challenged by several empirical studies and abandoned by its chief proponent (Oller, 1983). The unitary trait view has been replaced, through both empirical research and theorizing, by the view that language proficiency is multicomponential, consisting of a number of interrelated specific abilities as well as a general ability or set of general strategies or procedures. Skehan and Alderson both suggest that the model of language test performance proposed by Bachman (1990b) represents proGREss in this area, since it includes both components of language ability and characteristics of test methods, thereby making it possible “to make statements about actual performance as well as underlying abilities” (Skehan, 1991, p. 9). At the same time, Skehan correctly points out that as research progresses, this model will be modified and eventually superseded. Both Alderson and Skehan indicate that an area where further progress is needed is in the application of theoretical models of language proficiency to the design and development of language tests. Alderson, for example, states that “we need to be concerned not only with . . . the nature of language proficiency, but also with language learning and the design and researching of achievementtests; not only with testers, and the problems of our professionalism,but also with testees, with students, and their interests, perspectivesand insights” (Alderson, 1991, p. 5).

A second area of research and proGREss is in our understanding of the effects of the method of testing on test performance, A number of empirical studies conducted in the 1980s clearly demonstrated that the kind of test tasks used can affect test performance as much as the abilities we want to measure (e.g., Bachman & Palmer, 1981, 1982, 1988; Clifford, 1981; Shohamy, 1983, 1984). Other studies demonstrated that the topical content of test tasks can affect performance (e.g., Alderson & Urquhart, 1985; Erickson & Molloy, 1983). Results of these studies have stimulated a renewed interest in the investigation of test content. And here the results have been mixed. Alderson and colleagues (Alderson, 1986, 1990; Alderson & Lukmani, 1986; Alderson, Henning, & Lukmani, 1987) have been investigating (a) the extent to which “experts” agree in their judgments about what specific skills EFL reading test items measure, and at what levels, and (b) whether these expert judgments about ability levels are related to the difficulty of items. Their results indicate first, that these experts, who included test designers assessing the content of their own tests, do not agree and, second, that there is virtually no relationship between judgments of the levels of ability tested and empirical item difficulty. Bachman and colleagues, on the other hand (Bachman, Davidson, Lynch, & Ryan, 1989; Bachman, Davidson, & Milanovic, 1991; Bachman, Davidson, Ryan, & Choi, in press) have found that by using a content-rating instrument based on a taxonomy of test method characteristics (Bachman, 1990b) and by training raters, a high degree of agreement among raters can be obtained, and such content ratings are related to item difficulty and item discrimina-tion. In my view, these results are not inconsistent. The research of Alderson and colleagues presents, I believe, a sobering picture of actual practice in the design and development of language tests: Test designers and experts in the field disagree about what language tests measure, and neither the designers nor the experts have a clear sense of the levels of ability measured by their tests. This research uncovers a potentially serious problem in the way language testers practice their trade. Bachman’s research, on the other hand, presents what can be accomplished in a highly controlled situation, and provides one approach to solving this problem. Thus, an important area for future research in the years to come will be in the refinement of approaches to the analysis of test method character-istics, of which content is a substantial component, and the inves-tigation of how specific characteristics of test method affect test performance. Progress will be realized in the area of language test-ing practice when insights from this area of research inform the de-sign and development of language tests. The research on test con-tent analysis that has been conducted by the University of Cam-bridge Local Examinations Syndicate, and the incorporation of that research into the design and development of EFL tests is illustrative of this kind of integrated approach (Bachman et al., 1991), The 1980s saw a wealth of research into the characteristics of test takers and how these are related to test performance, generally under the rubric of investigations into potential sources of test bias; I can do little more than list these here. A number of studies have shown differences in test performance across different cultural, linguistic or ethnic groups (e.g., Alderman & Holland, 1981; Chen & Henning, 1985; Politzer & McGroarty, 1985; Swinton & Powers, 1980; Zeidner, 1986), while others have found differential performance between sexes (e.g., Farhady, 1982; Zeidner, 1987). Other studies have found relationships between field dependence and test performance (e.g., Chapelle, 1988; Chapelle & Roberts, 1986; Hansen, 1984; Hansen & Stansfield, 1981; Stansfield & Hansen, 1983). Such studies demonstrate the effects of various test taker characteristics on test performance, and suggest that such characteristics need to be considered in both the design of language tests and in the interpretation of test scores. To date, however, no clear direction has emerged to suggest how such considerations translate into testing practice. Two issues that need to be resolved in this regard are .(a) whether and how we assess the specific characteristics of a given group of test takers, and (b) whether and how we can incorporate such information into the way we design language tests. Do we treat these characteristics as sources of test bias and seek ways to somehow “correct” for this in the way we write and select test items, for example? Or, if many of these characteristics are known to also influence language learning, do we reconsider our definition of language ability? The investigation of test taker characteristics and their effects on language test performance also has implications for research in second language acquisition (SLA), and represents what Bachman (1989) has called an “interface” between SLA and language testing research.#p#分页标题#e#

METHODOLOGICAL ADVANCES

Many of the developments mentioned way we view language ability, the effects taker characteristics—have been facilitated that are available for test analysis. These above—changes in the of test method and test by advances in the tools advances have been in three areas: psychometrics, statistical analysis, and qualitative approaches to the description of test performance. The 1980s saw the application of several modern psychometric tools to language testing: item response theory (IRT), generalizability theory (G theory), criterion-referenced (CR) measurement, and the Mantel-Haenszel procedure. As these tools are fairly technical, I will simply refer readers to discussions of them: IRT (Henning, 1987), G theory (Bachman, 1990b; Bolus, Hinofotis, & Bailey, 1982), CR measure-ment (Bachman, 1990b; Hudson & Lynch, 1984), Mantel-Haenszel (Ryan & Bachman, in press). The application of IRT to language tests has brought with it advances in computer-adaptive language testing, which promises to make language tests more efficient and adaptable to individual test takers, and thus potentially more useful in the types of information they provide (e.g., Tung, 1986), but which also presents a challenge not to complacently continue using familiar testing techniques simply because they can be administered easily via computer (Canale, 1986). Alderson (1988a) and the papers in Stansfield (1986) provide extensive discussions of the applications of computers to language testing.

the major advance in the area of statistical analysis has been the application of structural equation modeling to language testing research. (Relatively nontechnical discussions of structural equation modeling can be found in Long, 1983a, 1983b.) The use of confirmatory factor analysis was instrumental in demonstrating the untenability of the unitary trait hypothesis, and this type of analysis, in conjunction with the multitrait/multimethod research design, continues to be a productive approach to the process of construct validation. Structural equation modeling has also facilitated the investigation of relationships between language test performance and test taker characteristics (e.g., Fouly, 1985; Purcell, 1983) and different types of language instruction (e.g., Sang, Schmitz, Vollmer, Baumert, & Roeder, 1986).

A third methodological advance has been in the use of introspec-tion to investigate the processes or strategies that test takers employ in attempting to complete test tasks. Studies using this approach have demonstrated that test takers use a variety of strategies in solving language test tasks (e.g., Alderson, 1988c; Cohen, 1984) and that these strategies are related to test performance (e.g., Anderson, Cohen, Perkins, & Bachman, 1991; Nevo, 1989).

Perhaps the single most important theoretical development in language testing in the 1980s was the realization that a language test score represents a complexity of multiple influences. As both Alderson and Skehan point out, this advance has been spurred on, to a considerable extent, by the application of the methodological tools discussed above. But, as Alderson (1991) notes, “the use of more sophisticated techniques reveals how complex responses to test items can be and therefore how complex a test score can be” (p. 12). Thus, one legacy of the 1980s is that we now know that a language test score cannot be interpreted simplistically as an indicator of the particular language ability we want to measure; it is also affected to some extent by the characteristics and content of the test tasks, the characteristics of the test taker, and the strategies the test taker employs in attempting to complete the test task. What makes the interpretation of test scores particularly difficult is that these factors undoubtedly interact with each other. The particular strategy adopted by a given test taker, for example, is likely to be a function of both the characteristics of the test task and the test taker’s personal characteristics. This realization clearly indicates that we need to consider very carefully the interpretations and uses we make of language test scores and thus should sound a note of caution to language testing practitioners. At the same time, our expanded knowledge of the complexity of language test perfor-mance, along with the methodological tools now at our disposal, provide a basis for designing and developing language tests that are potentially more suitable for specific groups of test takers and more useful for their intended purposes.

ADVANCES IN LANGUAGE TEST DEVELOPMENT

For language testing, the 1980s could be characterized as the decade of “communicative” testing. Although two strains of communicative approaches to language testing can be traced, as with many innovations in language testing over the years, the major impetus has come from language teaching. One strain of communi-cative tests, illustrated by the Ontario Assessment Pool (Canale & Swain, 1980a) and the A Vous la Parole testing unit described by Swain (1985), traces its roots to the Canale/Swain framework of communicative competence (Canale, 1983; Canale & Swain, 1980b). The other, exemplified by the Test of English for Educational Purposes (Associated Examining Board, 1987; Weir, 1983), the Ontario Test of English as a Second Language (Wesche et al., 1987), and the international English Language Testing Service (e.g., Alderson, 1988b; Alderson, Foulkes, Clapham, & Ingram, 1990; Criper & Davies, 1988; Seaton, 1983) has grown out of the English for specific purposes tradition. While a number of lists of characteristics of communicative language tests has been proposed (e.g., Alderson, 1981a; Canale, 1984; Carroll, 1980; Harrison, 1983; Morrow, 1977, 1979), I will mention four characteristics that would appear to distinguish communicative language tests. First, such testscreate an“information gap,” requiring test takers to processcomplementary information through the use of multiple sources of input. Test takers, for example, might be required to perform a writing task that is based on input from both a short recorded lecture and a reading passage on the same topic. A second characteristic is that of task dependency, with tasks in one section of the test building upon the content of earlier sections, including the test taker’s answers to those sections. Third, communicative tests can be characterized by their integration of test tasks and content within a given domain of discourse. Finally, communicative tests attempt to measure a much broader range of language abilities— including knowledge of cohesion, functions, and sociolinguistic appropriateness—than did earlier tests, which tended to focus on the formal aspects of language—grammar, vocabulary, and pronun-ciation. A different approach to language testing that evolved during the 1980s is the adaptation of the FSI oral interview guidelines (Wilds, 1975) to the assessment of the oral language proficiency in contexts outside agencies of the U.S. government. This “AEI” (For American Council for the Teaching of Foreign Languages/ Educational Testing Service/ I nteragency Language Roundtable) approach to language assessment is based on a view of language proficiency as a unitary ability (Lowe, 1988), and thus diverges from the view that has emerged in language testing research and other areas of applied linguistics. This approach to oral language assessment has been criticized by both linguists and applied linguists, including language testers and language teachers,on a number of grounds (e. g., Alderson, 1981b; Bachman, 1988; Bachman & Savignon, 1986; Candlin, 1986; Kramsch, 1986; Lantolf & Frawley, 1985, 1988; Savignon, 1985). Nevertheless, the approach and ability levels defined have been widely accepted as a standard for assessing oral proficiency in a foreign language in the U.S. and have provided the basis for the development of “simulated oral proficiency inter-views” in various languages (e.g., Stansfield & Kenyon, 1988, 1989). In addition, the approach has been adapted to the assessment of EFL proficiency in other countries (e.g., Ingram, 1984).#p#分页标题#e#

these two approaches to language assessment—communicative and AEI—are based on differing views of the nature of language proficiency, and are thus likely to continue as separate, unrelated approaches in the years to come. Lowe (1988) has explicitly articulated such a separatist view, in stating that the “concept of Communicative Language Proficiency (CLP), renamed Communi-cative Language Ability (CLA), and AEI proficiency may prove 678 TESOL QUARTERLYincompatible” (p. 14). Communicative language testing and AEI assessment represent two different approaches to language test design, and each has developed a number of specific manifestations in language tests. As a result, language testing will be enriched in the years to come by the variety of tests and testing techniques that emerge from these approaches.

This summary has focused on common areas among four recent reviews of language testing. In addition to these common areas, each of the reviews mentions specific areas of proGREss or concern. Skehan (1991) and Alderson (1991) both note that until very recently other areas of applied linguistics have provided very little input into language testing. Skehan, however, is encouraged by the relevance to language testing of recent work in sociolinguistics, second language acquisition, and language teaching, and points out the need for language testing to be aware of and receptive to input from developments in other areas of applied linguistics such as the SLA-based approach to assessing language development of Pienemann, Johnston, & Brindley (1988). Skehan and Alderson both argue that language testing must continue to investigate new avenues to assessment, such as formats that measure communicative abilities more successfully (e. g., Milanovic, 1988); “series tasks,” in which specified language interactions are scored in terms of how particular aspects of information are communicated; group testing; self-assessment; and computer-based language testing. Alderson discusses two additional areas to which language testing needs to turn its attention in the years to come: “washback” effects and learner-centered testing. He points out that while we generally assume that tests have an impact on instruction (washback), there is virtually no empirical research into how, if at all, instructional impact functions, under what conditions, and whether deliberate attempts to design tests with positive instructional impact are effective. Alderson also argues persuasively for the greater involve-ment of learners in the activity of testing, in the design and writing of tests, and in the setting of standards for success. In this regard, I would mention the work of Brindley (1989) in assessing language achievement in learner-centered instructional settings and the papers in de Jong & Stevenson (1990), which address issues in individualizing language testing. A final area of development, mentioned by Bachman (1990b), is the renewed interest in language aptitude and developments in both the definition of the theoretical construct and in approaches to its measurement (Perry & Stansfield, 1990).

As a result of the developments of the 1980s, language testing has emerged as a discipline in its own right within applied linguistics. notes that since 1980 language testing has seen the creation of an internationally respected journal, Language Testing, as well as several regular newsletters; five new texts on language testing as well as over a dozen volumes of collected papers have been published; and there are now at least two regular major international conferences each year devoted to language testing. The field of language testing has seen the development of both a model of language test performance that can guide empirical research, and the application of a variety of research approaches and tools to facilitate such research. In sum, language testing can now claim its own research questions and research methodology. As Bachman (1990a) states, “perhaps for the first time in the history of language testing it is possible to see a genuine symbiotic relationship between applied linguistic theory and the tools of empirical research as they are applied to both the development and the examination of a theory of performance on language tests [and to] the development and use of better language tests” (p. 220). Also as a result of developments in the past decade, language testing is in a better position, I believe, both to make contributions to its sister disciplines in applied linguistics and to be enriched by developments in those disciplines. The next part of this paper briefly describes what I consider two contributions that language testing has to offer to the areas of language learning and language teaching.

PART 2: AN INTERACTIONAL APPROACH TO

LANGUAGE TEST DEVELOPMENT

Language tests are used for a variety of purposes; these can be grouped into two broad categories. First, the results of language tests may be used to make inferences about test takers’ language abilities or to make predictions about their capacity for using language to perform future tasks in contexts outside the test itself. Second, decisions (e.g., selection, diagnosis, placement, proGREss, grading, certification, employment) may be made about test takers on the basis of what we infer from test scores about their levels of ability or their capacity for nontest language use. A major consideration in both the design and use of language tests, therefore, is the extent to which the specific test tasks we include elicit instances of language use from which we can make such inferences or predictions. What this implies is that in order to investigate and demonstrate the validity of the uses we make of test scores, we need a theoretical framework within which we can describe language test performance as a specific instance of language use. Specifically, in order to make #p#分页标题#e#

In an instructional setting, for example, in which we may want to use a test to measure learners’ deGREes of mastery of different components of language ability that have been covered in the curriculum, we need to demonstrate that the content of the test is representative of the content of the course. Specifically, we will want to demonstrate that the components of language ability included in the test correspond to those covered in the course and that the characteristics of the test tasks correspond to the types of classroom learning activities included in the program. Demonstrat-ing correspondences such as these provides some justification for interpreting test scores as evidence of levels of ability in the different components tested.

Another example would be a situation in which we need to select individuals for possible employment in a job which requires a specified level of proficiency in a foreign language. In this case, we need to demonstrate that the tasks included in the test are represen-tative of the language use tasks required by the future job. Demon-strating this correspondence provides some justification for using the test scores to predict future capacity for using the foreign lan-guage effectively in the target employment situation.

Demonstrating correspondences between test performance and language use is equally important for justifying the use of language tests in applied linguistics research. For example, if we were inter-ested in investigating the interlanguage development of a specific component of ability in a target language, for example, sensitivity to appropriate register, and wanted to use a test as one of our research instruments, we would need to be sure that the test we used mea-sured this aspect of language ability. Similarly, we would want to specify the characteristics of the tasks included in the test, so as to minimize any variations that may arise between performance on this test and other elicitation procedures we may want to use. In this part of the paper I will present a framework that I believe provides a basis for relating test performance to nontest language use. This framework includes a model of language ability for describing the abilities involved in language use and test perfor-mance and a framework of test method characteristics for relating the characteristics of tests and test tasks to features of the language use context. I will then suggest how this framework can be used to clarify our thinking about the notion of authenticity and for design-ing test tasks that are authentic.

LANGUAGE ABILITY

the language ability of the language user is one feature of language use. When we design a language test, we hypothesize that the test taker’s language ability will be engaged by the test tasks. Thus, in order to relate the abilities we believe are involved in test performance to the abilities involved in language use, we need a model of language ability. The model I will describe here is a refinement of my 1990 model that Adrian Palmer and I are developing (Bachman & Palmer, in press). We define language ability essentially in Widdowson’s (1983) terms as the capacity for using the knowledge of language in conjunction with the features of the language use context to create and interpret meaning. Our model of language ability includes two types of components: (a) areas of language knowledge, which we would hypothesize to be unique to language use (as opposed to, for example, mathematical knowledge or musical knowledge), and (b) metacognitive strategies that are probably general to all mental activity.

This view of language ability is consistent with research in applied linguistics that has increasingly come to view language ability as consisting of two components: (a) language knowledge, sometimes referred to as competence, and (b) cognitive processes, or procedures, that implement that knowledge in language use (e.g., Bachman, 1990a; Bialystok, 1990; Spolsky, 1989; Widdowson, 1983). It is also consistent with information-processing, or cognitive, models of mental abilities, which also distinguish processes or heuristics from domains of knowledge (e.g., Sternberg, 1985, 1988). Language use involves the integration of multiple components and processes, not the least of which are those that constitute language ability. It is unlikely that every language test we develop or use will be intended to measure all the components in our model. Nevertheless, even though we may be interested in focusing on only one or a few of these in a given testing context, we need to be aware of the full range of language abilities as we design and develop language tests and interpret language test scores, For example, even though we may only be interested in measuring an individual’s knowledge of vocabulary, the kinds of test items, tasks or texts we use need to be selected with an awareness of what other 682 TESOL QUARTERLYcomponents of language ability they may evoke. We believe, therefore, that even though a given language test may focus on a narrow range of language abilities, its design must be informed by a broad view of language ability.#p#分页标题#e#

Language Knowledge

1

What we refer to as language knowledge can be regarded as a domain of information that is specific to language ability and that is stored in long-term memory. For our purposes, we do not attempt to characterize how this knowledge is stored. That is, we use the term knowledge to refer to both conscious and tacit, analyzed and unanalyzed knowledge. While the importance of such distinctions has been recognized in other areas of applied linguistics, it remains to be seen how relevant they are to the design, development, and use of language tests.

Language knowledge includes two broad areas: organizational knowledge and pragmatic knowledge. these are constantly chang-ing, as new elements are learned or acquired, and existing elements restructured. The learning or acquisition of areas of language knowledge is beyond the scope of my discussion here, and for purposes of describing how they pertain to language use, I will treat them as more or less stable traits or constructs. The areas of language knowledge are given in Figure 1 below. Discussion of these elements of language knowledge is beyond the scope of this paper. I would simply indicate that this model of language ability has evolved from earlier models, particularly that of Canale & Swain (Canale, 1983; Canale & Swain, 1980b) as a result of both empirical research and review of relevant literature in applied linguistics. The model presented here thus includes a much wider range of elements and provides a more comprehensive view of language ability than have earlier models.

Strategic Competence

the second component of language ability is what I have called strategic competence, and have described as consisting of three sets 1 This description of language knowledge is essentially the same as Bachman’s (1990b) discussion of language competence. The change in terminology from competence to knowledge reflects the view that the former term now carries with it a GREat deal of unnecessary semantic baggage that makes it less useful conceptually than it once was. I would note two changes from Bachman’s 1990 model: (a) “Vocabulary” has been removed from “organizational competence” and placed within a new area, “propositional knowledge,” under “pragmatic knowledge,” and (b) “illocutionary competence” has been renamed “functional knowledge.”

猜你喜欢

推荐栏目