ETH官方钱包

前往
大廳
主題

狗頭人哨兵 追逐 3

夏洛爾 | 2022-12-03 23:34:23 | 巴幣 0 | 人氣 207


Kobold Sentinel Run V3

實(shí)驗(yàn)?zāi)繕?biāo):
1.進(jìn)入靜立狀態(tài)後,進(jìn)入追逐狀態(tài),在追逐狀態(tài)下,要能持續(xù)跑至接近目標(biāo)的距離內(nèi)
2.動(dòng)作引導(dǎo)為雙臂展開身體前傾的帥氣奔跑動(dòng)作

實(shí)驗(yàn)設(shè)計(jì):
(完全沿用 狗頭人哨兵追逐2)
1.任何弱點(diǎn)觸地皆失敗 (尾巴和劍並非弱點(diǎn))
2.使用ClampReward
if(koboldBodies[i].damageCoef > 0f){clampReward += -0.1f * koboldBodies[i].damageCoef;}
3.
//Set: judge.endEpisode = false//Set: nearModeRange = 1f//Set: weapon, tail is not weakness. If is, Stand would back to GetUpif(weaknessOnGround){// LogWeaknessOnGround();if(inferenceMode){brainMode = BrainMode.GetUp;SetModel("KoboldGetUp", getUpBrain);behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}else{AddReward(-1f);judge.outLife++;judge.Reset();return;}}else if(koboldRoot.localPosition.y < -10f){if(inferenceMode){brainMode = BrainMode.GetUp;SetModel("KoboldGetUp", getUpBrain);behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}else{AddReward(-1f);judge.outY++;judge.Reset();return;}}else{targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldHeadRb.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldRootRb.position);flatTargetVelocity = rootDir;flatTargetVelocity.y = 0f;targetDistance = flatTargetVelocity.magnitude;lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.right * -1f, Vector3.up));aimVelocity = flatTargetVelocity.normalized;aimVelocity.y = 0.2f;//LeanVector3 leanDir = rootAimRot * flatTargetVelocity;spineUpAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldSpine.right * -1f, leanDir));rootUpAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldRoot.up, leanDir));//Naruto ArmVector3 flatLeftDir = Vector3.Cross(flatTargetVelocity, Vector3.up);leftUpperArmAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldLeftUpperArm.right, leftUpperArmAimRot * flatTargetVelocity));leftForeArmAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldLeftForeArm.right, leftForeArmAimRot * flatTargetVelocity));rightUpperArmAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldRightUpperArm.right, rightUpperArmAimRot * flatTargetVelocity));rightForeArmAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldRightForeArm.right, rightForeArmAimRot * flatTargetVelocity));weaponAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldWeapon.up, weaponAimRot * flatTargetVelocity));tailRootAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldTailRoot.right, flatTargetVelocity));tailMidAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldTailMid.right, flatTargetVelocity));tailTopAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldTailTop.right, flatTargetVelocity));leftThighAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldLeftThigh.forward * -1f, flatLeftDir));rightThighAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldRightThigh.forward * -1f, flatLeftDir));avgVelocity = velocityBuffer.GetSmoothVal();velocityAngle = Vector3.Angle(avgVelocity, aimVelocity);velocityAngleCoef = Mathf.InverseLerp(180f, 0f, velocityAngle);flatVelocity = avgVelocity;flatVelocity.y = 0f;flatVelocityManitude = flatVelocity.magnitude;velocityCoef = Mathf.InverseLerp(0f, 8f, Vector3.Project(avgVelocity, aimVelocity).magnitude );flatVelocityAngle = Vector3.Angle(flatVelocity, flatTargetVelocity);if(!inferenceMode){if(targetDistance > nearModeRange){if(Time.fixedTime - landingMoment > landingBufferTime){bool outSpeed = flatVelocityManitude < Mathf.Lerp(0f, 7f, (Time.fixedTime - landingMoment - landingBufferTime)/4f);bool outDirection = flatVelocityAngle > Mathf.Lerp(180f, 10f, (Time.fixedTime - landingMoment - landingBufferTime)/4f);float motionLimit = Mathf.Lerp(0f, 0.5f, (Time.fixedTime - landingMoment - landingBufferTime)/4f);float motionLimit2 = Mathf.Lerp(0f, 0.7f, (Time.fixedTime - landingMoment - landingBufferTime)/4f);bool outMotion = lookAngle < motionLimit2 || upAngle < motionLimit2 || leftThighAngle < motionLimit2 || rightThighAngle < motionLimit2 || spineUpAngle < motionLimit || rootUpAngle < motionLimit || leftUpperArmAngle < motionLimit || leftForeArmAngle < motionLimit || rightUpperArmAngle < motionLimit || rightForeArmAngle < motionLimit;// || weaponAngle < motionLimit;if( outSpeed || outDirection || outMotion){AddReward(-1f);if(outSpeed){judge.outSpeed++;}if(outDirection){judge.outDirection++;}if(outMotion){judge.outMotion++;}judge.Reset();return;}}lastReward = (velocityAngleCoef + velocityCoef) * 0.02f + (lookAngle+upAngle) * 0.0125f + (leftThighAngle+rightThighAngle) * 0.0075f+ (spineUpAngle+rootUpAngle) * 0.005f+ (leftUpperArmAngle+leftForeArmAngle+rightUpperArmAngle+rightForeArmAngle+weaponAngle+tailRootAngle+tailMidAngle+tailTopAngle ) * 0.001f+ (1f - exertionRatio) * 0.002f;if(useClampReward){lastReward = lastReward+clampReward;if(lastReward < 0f) lastReward = 0f;}totalReward += lastReward;AddReward( lastReward );}// else if(targetDistance > 1.5f)else{// AddReward(1f);judge.survived++;judge.Reset();return;}}}

//大致來說,
--1.獎(jiǎng)勵(lì)視線,並使用Force Sharping
--2.獎(jiǎng)勵(lì)投影至"跑動(dòng)推薦向量"的速度和角度,並使用Force Sharping
--3.獎(jiǎng)勵(lì)Root、Spine、雙臂特定向量(forward/up/right)符合指定角度,並使用Force Sharping
--4.獎(jiǎng)勵(lì)尾巴全體符合指定角度,但"並不使用Force Sharping"
--5.獎(jiǎng)勵(lì)減少動(dòng)作變化

實(shí)驗(yàn)時(shí)間:
Step: 1.5e8
Time Elapsed: 246067s (68.35hr)

實(shí)驗(yàn)結(jié)果:
實(shí)驗(yàn)結(jié)果為失敗

由於是單純相同實(shí)驗(yàn)程式,直接訓(xùn)練數(shù)量變成3倍
狗頭人哨兵變得更不能跑,因此間接證明訓(xùn)練並不是越多越好
雖然有可能追逐2剛好是5e7的成果高峰,而追逐3是1.5e8的成果低谷,但仍然說明增加訓(xùn)練數(shù)不會(huì)真正解決穩(wěn)定性問題

決定修正以下問題
Q.很容易Out of Life
1.嘗試目前因?yàn)樽o(hù)具容易觸地的Calf改為非弱點(diǎn),但使用ClampReward
2.暫時(shí)不變更碰撞框
因?yàn)楦杏X是逃避問題,而不是處理問題,不能保證未來沒有刻意有類似肢體特性的人偶

Q.很容易Out of Motion
1.原本頭部引導(dǎo)為前方必須朝向目標(biāo),而上向量必須朝向天空,然而這是不合理的
對(duì)於很高的人物前方朝向目標(biāo)就必須低頭,而無法上向量朝向天空,因此將第二向量同樣改為SideLook
2.重新把武器納入ForceSharping,收在胸前的傾向看起來還是弊大於利

下個(gè)實(shí)驗(yàn)將進(jìn)行狗頭人哨兵追逐
追蹤 創(chuàng)作集

作者相關(guān)創(chuàng)作

更多創(chuàng)作