【C语言】交叉引用生成器练习题

vlambda
2020-04-01

【C语言】交叉引用生成器练习题

写在前面的话：我本人也是学生，难免会出错，所以如有错误或更好的建议可以在下方留言。

这周编程课留了一道名为“交叉引用生成器”的题目，题目有点长，做起来也确实费了我不少事，题目如下：

样例输入如下：

Alcatel provides end-to-end solutions.It enables enterprises to deliver content to any type of user.lcatel operates in 130 countries.Alcatel focus on optimizing their service offerings and revenue streams

样例输出如下：

Alcatel:1,4It:2any:2content:2countries:3deliver:2enables:2end-to-end:1enterprises:2focus:4in:3lcatel:3of:2offerings:4on:4operates:3optimizing:4provides:1revenue:4service:4solutions:1streams:4their:4to:2type:2user:2

乍一看好像并没有什么思路，既不知道输入的行数，也不知道每一行单词的个数，这可如何是好......？

目前我们可以确定的是使用结构体，将每个独立成块的字符串、它对应的行数以及一共出现过几次存储起来，同时剔除标点符号，然后逐一判断这是不是一个合法的单词，再进行排序，最后输出。整体思路确定好了，接下来就是一步一步的实现了。结构体如下：

struct edge { char word[21]; int lines[201], numnow;};

（这里的行数lines可以直接使用set容器，避免了后续的筛重过程。结构体也可以替换成链表，只是比较麻烦）

1、将文件内容读取到程序里

条件这么棘手，看来只能使用失传已久的江湖绝学while循环+判断字符了。

首先，我们需要知道，在以文件形式进行输入的时候，如果使用while循环，当输入函数(fscanf,fgetc,scanf等)读取到文件末尾时会返回EOF(end of file)，而EOF在stdio.h头文件中已有定义（如下）。

#define EOF (-1)
int __cdecl fgetc(FILE *_File);int __cdecl fscanf(FILE * __restrict__ _File,const char * __restrict__ _Format,...) __MINGW_ATTRIB_DEPRECATED_SEC_WARN;

因此我们可以使用fgetc函数单个读入一个字符，判断它是不是空格或者换行符或者文件末尾来分割字符串、控制行数以及输入的结束与否。在这里，我把这个单独输入的字符记为testpoint，也就是说它就像探针一样来判断输入到底进行到了哪一步。

当探针完成了它的使命之后，我们需要使用ungetc函数将探针返回到输入流中（防止出现单个字母的单词），该函数也保存在stdio.h头文件下，其声明如下：

int __cdecl ungetc(int _Ch,FILE *_File);

返回探针之后，利用fscanf遇到空格、换行符、tab停止读入的特点，将这个可疑的字符串（不知道是不是合法单词）读取进来。

之后，自定义一个函数pickaword用于剔除标点符号。实现此过程，我选择的是将\0赋值给符号位，较简单，给出代码如下：

void pickaword (char a[]){              //rid all illegal signatures int i; for (i = 0; a[i] != '\0' && a[i] != ',' && a[i] != '.'; i ++); a[i] = '\0';}

（这里默认了没有"?","!"等其他标点，如果需要可以添加上去）

接下来判断这是不是一个从来没有出现过的可疑单词。这里引用布尔变量ifanew存储是不是一个新的可疑单词，初始值为true。用for循环进行遍历，终点是结构体的个数，再用strcmp函数进行比较，如果返回值是0，表示相同的可疑单词，直接将对应的行数赋值给结构体里面的lines，同时ifanew更新为false；若返回值不是0，则使用一个新的结构体，同时结构体数量自增1。至此，我们成功读入了一个可疑单词，就是如此简单，然后执行循环，一直循环到EOF为止。

2、对结构体进行排序

排序的几种方法（简单排序法、直接插入法、冒泡法）都可以使用，这里我使用的是冒泡法，只不过这里需要判断条件是使用strcmp的返回值判断判断两个字符串的顺序，然后引用一个中间结构体tempstruct进行交换，这个问题不大。这里需要注意，在第一步中，执行完最后一个循环，结构体数量会再自增一次，所以在排序时结构体数量需要减一。放出代码：

for(int i = 0; i < structnums - 1; i ++) {                    //sort for (int j = 0; j < structnums - i - 1; j ++) { if (strcmp(words[j].word, words[j + 1].word) >= 0) { tempstruct = words[j]; words[j] = words[j + 1]; words[j + 1] = tempstruct; } } }

3、按照要求输出

这个输出我搞了好长时间，因为有太多小细节需要注意，经常弄着弄着就乱了。首先要有一个判断，只有这个可疑单词是合法的单词才能进入到输出过程，否则continue。

进去之后，首先将单词和冒号输出。如果这个单词只出现一次，那么就不用讨论是否会重复以及逗号的问题，直接输出lines[0]以及换行符。如果有两行及以上，先输出lines[0]，之后再根据是否重复决定是否输出，后面的输出都采用",%d"格式输出。循环直到最后一个结构体。

完整代码献上：

#include <stdio.h>#include <string.h>
struct edge { char word[21]; int lines[201], numnow;}words[201], tempstruct;
bool ifaword (char a[]) { //used to judge if a string is legal word int len = strlen(a); if (len == 0) return false; for (int i = 0; i < len; i ++) { if (!((a[i] >= 'a' && a[i] <= 'z') || (a[i] >= 'A' && a[i] <= 'Z') || (a[i] == '-' && i != 0))) return false; } if (strcmp ("a", a) == 0 || strcmp ("an", a) == 0 || strcmp ("the", a) == 0 || strcmp ("and", a) == 0) return false; return true;}
void pickaword (char a[]){ //rid all illegal signatures int i; for (i = 0; a[i] != '\0' && a[i] != ',' && a[i] != '.'; i ++); a[i] = '\0';}
int main() { memset (words, 0, sizeof (words)); FILE* fpin = fopen ("crossin.txt", "r"); if (fpin == NULL) return 0; int lines = 1, structnums = 0; char testpoint, tempword[21]; //use a testpoint to test if is a space or the end of line or file  while ((testpoint = fgetc (fpin)) != EOF){ //if the testpoint is end of file bool ifanew = true; if (testpoint == '\n'){ //end of the line lines ++; continue; } else if (testpoint == ' ') continue; //a space else{ ungetc (testpoint, fpin); //load the charactor back to the file in case one-letter-word appear fscanf (fpin, "%s", tempword); } pickaword (tempword);    for (int i = 0; i < structnums; i ++) {                                   //judge if this is a new word if (strcmp (tempword, words[i].word) == 0) { words[i].lines[words[i].numnow ++] = lines; ifanew = false; break; } } if (ifanew){ strcpy (words[structnums].word, tempword); words[structnums].lines[words[structnums].numnow] = lines; words[structnums].numnow ++; structnums ++; } } fclose (fpin);  for(int i = 0; i < structnums - 1; i ++) { //sort for (int j = 0; j < structnums - i - 1; j ++) { if (strcmp(words[j].word, words[j + 1].word) >= 0) { tempstruct = words[j]; words[j] = words[j + 1]; words[j + 1] = tempstruct; } } }  FILE* fpout = fopen ("crossout.txt", "w"); if (fpout == NULL) return 0; for (int i = 0; i < structnums; i ++) { if (!ifaword(words[i].word)) continue; fprintf (fpout, "%s:", words[i].word); if (words[i].numnow == 1) fprintf (fpout, "%d\n", words[i].lines[0]); else{ for (int j = 0; j < words[i].numnow; j ++) { if (j == 0) fprintf (fpout, "%d", words[i].lines[0]); else if (words[i].lines[j] != words[i].lines[j - 1]) fprintf (fpout, ",%d", words[i].lines[j]); if (j == words[i].numnow - 1) fprintf (fpout, "\n"); } } } fclose (fpout); return 0;}