在编程过程中，我们经常会使用到文本文件进行数据保存操作，这些保存的文件中内容要么就是用逗号分隔、要么就是用空格进行分割，在利用程序进行读取时，我们关心的是实际的内容，不管是用什么符号分割开。本博文就介绍一下在C++中常用的格式化读入文本文件的操作。

应用场景

假设有文件output.txt，其中每一行一组数据，数据之间利用空格分隔，如下图所示：

其中第一列为行号，后面每组17个数值（有字符串和数字不同类型），现在需要读取该文本文件中数据内容。

为了实现该目的，可以有不同的操作方式，这里主要提供一种操作方法，供大家参考，你也可以利用这篇博文定义的函数实现你的操作流程。

步骤1. 读入整个文件内容

1.1 使用Boost库方法

Boost库包含了很多C++标准库中没有的高级操作，如果你的代码项目中使用了Boost库，或者你有使用Boost库的习惯，则首先推荐使用Boost库来进行操作。

定义函数readFileContent用来整体读入整个文本文件内容。
该函数第一个参数为文件名，第二个参数为字符串向量，用来存储读入的整个文件内容，向量的每一个元素对应文件的每一行。

#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/filesystem/operations.hpp>
#include <boost/filesystem/path.hpp>
#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;

void readFileContent(const std::string &file,  std::vector<std::string>& content)
{
  boost::char_separator<char> sep_line {"\n"};//按行结束符来分割每一行

  // Read all contents in file
  std::ifstream t(file);
  std::stringstream buffer;
  buffer << t.rdbuf();
  std::string contents(buffer.str());

  // Separate every line
  tokenizer tok_line(contents, sep_line);
  std::vector<std::string> lines(tok_line.begin(), tok_line.end());
  content = lines;
}

1.2 非Boost库方法

如果你不想使用Boost库，只想用纯C++标准库来实现此步操作也是可以的。

同样定义函数readFileContent用来整体读入整个文本文件内容。与Boost库方法不同的是，这里提供的方法用来将整个文件内容全都读入到一个字符串变量中。
第一个参数表示文件名，第二个参数直接用一个字符串变量表示读入的所有内容。

void readFileContent(const std::string &file, std::string &content){
  FILE *fl = fopen(file.c_str(), "rb");
  if(fl == NULL){
    return;
  }
  fseek(fl, 0, SEEK_END);
  int len = ftell(fl);
  if(len <= 0){
    return;
  }
  fseek(fl, 0, SEEK_SET);
  char *buf = new char[len+1];
  memset(buf, 0, len+1);
  fread(buf, 1, len, fl);
  content = std::string(buf);
  delete []buf;
  fclose(fl);
}

步骤2. 格式化读出每一行内容

2.1. boost库tokenizer功能类（推荐方法）

定义getFileContent函数，参数为步骤一中的Boost库方法readFileContent函数读取得到的文件内容。

// 头文件与上面一致
void getFileContent(const std::vector<std::string>& lines)
{
	// Loop all lines
	for (const auto& line : lines) {
		boost::char_separator<char> sep{ " " };

		tokenizer tok(line, sep);
		vector<string> cols(tok.begin(), tok.end());
		// 保证每一行元素有17个
		if (cols.size() < 17)
			continue;

		// Read type
		std::string type = cols[2];
		// Read occluded
		int occluded = boost::lexical_cast<int>(cols[4]);
	}
}

函数依次遍历文件中每一行，然后定义boost::char_separator<char> sep{ " " };分隔符类，用空格作为分隔符来分开每一个元素。读取得到一行的所有元素作为string类型向量存储在cols中，然后根据具体数值类型来读取每一个元素内容，如果需要进行类型转换，需要调用boost::lexical_cast模板函数完成。

2.2 使用`substr`方法自定义字符串分割函数

#include <string>
#include <cstring>
#include <vector>

void Split(const std::string& input_str, std::vector<std::string>& output, const char* delim)  
{
    int pos = 0;  
    int npos = 0;  
    int regexlen = strlen(delim);  
    while((npos = input_str.find(delim, pos)) != -1) {  
        std::string tmp = input_str.substr(pos, npos - pos);  
        output.push_back(tmp);  
        pos = npos + regexlen;  
    }
    output.push_back(input_str.substr(pos, input_str.length() - pos));  
}

使用示例：

int main()
{
    string input = "S_00:  1.392  5.123";
    vector<string> result;
    Split(input, result," ");
    for(int i = 0;i<result.size();++i)
    {
        cout<<result[i]<<endl;
    }
}