MooGoo's Blog: PHP的utf-8斷句

2009年9月25日星期五

PHP的utf-8斷句

PHP的substr遇到UTF-8字元，沒切到邊界時．會出現亂碼。所以就要判斷UTF-8有幾個字節。


function moss_truncate($str, $cut) {
  $len = strlen($str);
  $count = 0; // count 
  $i = 0; // cut index

  if($len < $cut)
    return $str;

  do {
    if($count >= $cut)
      return substr($str, 0, $i) . '...';

    $value = ord($str[$i]); // ASCII value
    if($value > 191 and $value < 224) // 2 bytes
      $i+=2;
    elseif($value > 223 and $value < 240) // 3 byte 
      $i+=3;
    elseif($value > 239 and $value < 248) // 4 bytes
      $i+=3;
    else // others, include ASCII (less than 128)
      $i++;

    $count++;
  } while($i < $len);
  return $str;
}

$str = '蘑d，プリプリで美味';
echo moss_truncate($str, 8); //取8個"字", 不管是ASCII還是Unicode都算一個字
// 結果:
// 蘑d，プリプリで